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5 COMPARATIVE GENOMIC HYBRIDIZATION fCGED 
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FIELD OF THE INVENTION 

20 This invention relates generally to the field of cytogenetics, and more particularly 

to the field of molecular cytogenetics. It concerns methods of determining the relative 
copy numbers of different nucleic acid sequences in a subject cell or cell population 
and/or comparing the nucleic acid sequence copy numbers of substantially identical 
sequences in several cells or cell populations as a function of the location of those 

25 sequences in a reference genome. For instance, the methods of this invention provide the 
means to determine the relative number of copies of nucleic acid sequences in one or 
more subject genomes (for example, the DNA of one tumor cell or a number of cells 
from a subregion of a solid tumor) or portions thereof as a function of the location of 
those sequences in a reference genome (for example, a normal human metaphase spread). 



Further, the invention provides methods of determining the absolute copy number of 
nucleic acid sequences in a subject cell or cell population. 

Although the examples herein concern human cells and the language is primarily 
directed to human concerns, the concept of this invention is applicable to genomes from 
any plant or animal. The genomes compared need only be related closely enough to have 
sufficient substantially identical sequences for a meaningful analysis. For example, a 
human genome and that of another primate could be compared according to the methods 
of this invention. 

BACKGROUND OF THE INVENTION 

Chromosome abnormalities are associated with genetic disorders, degenerative 
diseases, and exposure to agents known to cause degenerative diseases, particularly 
cancer, German, "Studying Human Chromosomes Today," American Scientist, 58: 182- 
201 (1970); Yunis; "The Chromosomal Basis of Human Neoplasia," Science. 221; 227- 
236 (1983); and German, "Clinical Implication of Chromosome Breakage," in Generic 
Damage in Man Caused by Environmental Agents . Berg, Ed., pgs. 65-86 (Academic 
Press, New York, 1979). Chromosomal abnormalities can be of several types, including: 
extra or missing individual chromosomes, extra or missing portions of a chromosome 
(segmental duplications or deletions), breaks, rings and chromosomal rearrangements, 
among others. Chromosomal or genetic rearrangements include translocations (transfer of 
a piece from one chromosome onto another chromosome), dicentrics (chromosomes with 
two centromeres), inversions (reversal in polarity of a chromosomal segment), insertions, 
amplifications, and deletions. 



Detectable chromosomal abnormalities occur with a frequency of one in 
every 250 human births. Abnormalities that involve deletions or additions of 
chromosomal material alter the gene balance of an organism and generally lead to fetal 
death or to serious mental and physical defects* Down syndrome can be caused by 
5 having three copies of chromosome 21 instead of the normal 2. This syndrome is an 

example of a condition caused by abnormal chromosome number, or aneuploidy. Down 
syndrome can also be caused by a segmental duplication of a subregion on chromosome 
21 (such as, 21q22), which can be present on chromosome 21 or on another chromosome. 
C; Edward syndrome (18 4-), Patau syndrome (13+), Turner syndrome (XO) and Kleinfelter 
10 jf; syndrome (XXY) are among the most common numerical aberrations. [Epstein, The 
H Consequences of Chromosome Imbalance: Principles . Mechanisms and Models 
T (Cambridge Univ. Press 1986); Jacobs, Am. J. Epidemiol. 105: 180 (1977); and Lubs 
J et al., Science. 169: 495 (1970).] 

5; Retinoblastoma (del 13ql4), Prader-Willis syndrome (del 15qll- q!3), Wilm's 

15 " tumor (del llpl3) and Cri-du-chat syndrome (del 5p) are examples of important disease 
linked structural aberrations. [Nora and Fraser, Medical Genetics: Principles and 
Practice, (Lea and Febiger (1989).] 

One of the critical endeavors in human medical research is the discovery of 
genetic abnormalities that are central to adverse health consequences. In many cases, 
20 clues to the.location of specific genes and/or critical diagnostic markers come from 

identification of portions of the genome that are present at abnormal copy numbers. For 
example, in prenatal diagnosis, as indicated above, extra or missing copies of whole 
chromosomes are the most frequently occurring genetic lesion. In cancer, deletion or 
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multiplication of copies of whole chromosomes or chromosomal segments, and higher 
level amplifications of specific regions of the genome, are common occurrences. 

Much of such cytogenetic information has come over the last several 
decades from studies of chromosomes with light microscopy. For the past thirty years 
cytogeneticists have studied chromosomes in malignant cells to determine sites of 
recurrent abnormality to glean hints to the location of critical genes. Even though 
cytogenetic resolution is limited to several megabases by the complex packing of DNA 
into the chromosomes, this effort has yielded crucial information. Among the strengths 
of such traditional cytogenetics is the ability to give an overview of an entire genome at 
one time, permitting recognition of structural abnormalities such as inversions and 
translocations, as well as deletions, multiplications, and amplifications of whole 
chromosomes or portions thereof. With the coming of cloning and-detailed molecular 
analysis, recurrent translocation sites have been recognized as involved in the formation 
of chimeric genes such as the BCR-ABL fusion in chronic myelogeneous leukemia 
(CML); deletions have been recognized as frequently indicating the location of tumor 
suppressor genes; and amplifications have been recognized as indicating overexpressed 
genes. 

Conventional procedures for genetic screening and biological dosimetry 
involve the analysis of karyotypes. A karyotype is the particular chromosome 
complement of an individual or of a related group of individuals, as defined both by the 
number and morphology of the chromosomes usually in mitotic metaphase. It include 
such things as total chromosome number, copy number of individual chromosome types 
(e.g., the number of copies of chromosome X), and chromosomal morphology, e.g., as 
measured by length, centromeric index, connectedness, or the like. Karyotypes are 



conventionally determined by chemically staining an organism's metaphase, prophase or 
otherwise condensed (for example, by premature chromosome condensation) 
chromosomes. Condensed chromosomes are used because, until recently, it has not been 
possible to visualize interphase chromosomes due to their dispersed condition and the lack 
of visible boundaries between them in the cell nucleus. 

A number of cytological techniques based upon chemical stains have been 
developed which produce longitudinal patterns on condensed chromosomes, generally 
referred to as bands* The banding pattern of each chromosome within an organism 
usually permits unambiguous identification of each chromosome type (Latt, "Optical 
studies of Metaphase Chromosome Organization/ Annual RevieW of Biophysics and 
Bioengineering.J: 1-37 (1976)]. 

Unfortunately, such conventional banding analysis requires cell culturing 
and preparation of high quality metaphase spreads, which is time consuming and labor 
intensive, and frequently difficult or impossible. For example, cells from many tumor 
types are difficult to culture, and it is not clear that the cultured cells are representative of 
the original tumor cell population. Fetal ceils capable of being cultured, need to be 
cultured for several weeks to obtain enough metaphase cells for analysis, over the past 
decade, methods of in situ hybridization have been developed that permit analysis of 
intact cell nuclei-interphase cytogenetics. Probes for chromosome centromeres, whole 
chromosomes, and chromosomal segments down to the size of genes, have been 
developed. With the use of such probes, the presence or absence of specific 
abnormalities can be very efficiently determined; however, it is tedious to test for 
numerous possible abnormalities or to survey to discover new regions of the genome that 
are altered in a disease. 



The present invention, Comparative Genomic Hybridization (CGH) 
[formerly called Copy Ratio Reverse Cytogenetics (CRRC) among other names] provides 
powerful methods to overcome many of the limitations of existing cytogenetic techniques. 
When CGH is applied, for example, in the fields of tumor cytogenetics and prenatal 
diagnosis, it provides methods to determine whether there are abnormal copy numbers of 
nucleic acid sequences anywhere in the genome of a subject tumor cell or fetal cell or the 
genomes from representative cells from a tumor cell population or from a number of fetal 
cells, without having to prepare condensed chromosome spreads from those cells. Thus, 
cytogenetic abnormalities involving abnormal copy numbers of nucleic acid sequences, 
specifically amplifications and/or deletions, can be found by the methods of this invention 
in the format of an immediate overview of an entire genome or portions thereof. More 
specifically, CGH provides methods to compare and map the frequency of nucleic acid 
sequences from one or more subject genomes or portions thereof in relation to a reference 
genome. It permits the determination of the relative number of copies of nucleic acid 
sequences in one or more subject genomes (for example, those of tumor cells) as a 
function of the location of those sequences in a reference genome (for example, that of a 
normal human cell). 

Gene amplification is one of several mechanisms whereby cells can change 
phenotypic expression when increased amounts of specific proteins are required, for 
example, during development [Spradling and Mahowald, PNAS (USA), 77; 1096-1100 
(1980); Glover et al., PNAS (USA 79: 2947-2951 (1982)1, or during an environmental 
challenge when increased amounts of specific proteins@can impart resistance to cytotoxic 
agents [Melera et ah, J. Biol. Chem. 255: 7024-7028 (1980); Beach and Palmiter, PNAS 
fUSA. 78: 21102114 (1981)]. 



A major limitation of Southern analysis and related conventional techniques 
for analysis of gene amplification is that only specific sites are studied leaving the vast 
majority of the genome unekamined. Conventional cytogenetic studies, on the other 
hand, provide a broad survey of the genome but provide little information about genes 
that may be involved in amplification events. However, the procedures of this invention 
overcome those limitations. This invention can be used to show the normal chromosomal 
locations of all regions of a genome that are amplified or deleted wherein the size of the 
regions that can be detected is limited only by the resolution of the microscopy used and 
1 the organization of DNA in condensed chromosomes. Thus, this invention provides 
i among other uses the ability to study gene amplifications and deletions and their roles in 
• tumor development, progression and response to therapy more thoroughly than was 
\ possible previously. The methods of CGH are sufficiently rapid and simple that large 
i numbers of subject nucleic acids, for example from many tumors, can be analysed in 
studies for gene amplification and deletion. 

The karyotypic heterogeneity in solid tumors can be extreme. 
Identification of commonly occurring chromosomal changes by analysis of metaphase 
spreads is often difficult or impossible using conventional banding analysis because of the 
complexity of the rearrangements and because of the poor quality of the metaphase 
preparations. CGH overcomes that limitation in that the tumor nucleic acid can be 
studied without the requirement of preparing metaphase spreads. Since CGH can 
probably be performed on single cells by amplifying the nucleic acid therefrom, CGH can 
be used to investigate the heterogeneity of tumors by studying representative cells from 
different cell populations of the tumor. Alternatively, CGH of nucleic acid from a tumor 
extracted in a bulk extraction process from many cells of the tumor can reveal 



consistencies within the apparent heterogeneity. For example, the same amplified 
sequences may appear as homogeneously staining regions (HSRS) and/or double minute 
chromosomes (DMs) in one tumor cell but as an extension of a chromosome arm in 
another tumor cell. Thus, order from the apparent randomness may be realized by CGH 
hybridization. 

Montgomery et al., PNAS CUSA). 80: 5724-5728 (September 1983), 
concerns the hybridization of labeled Cot fractionated DNAs from tumor cell lines (a Cot 
fraction from which the high copy repeats, low copy repeats and single copy sequences 
were substantially removed) to metaphase spreads from said tumor cell lines. Basically, 
Montgomery et al. mapped the positions of nucleic acid sequences from tumor cell lines 
that are very highly amplified back to tumor cell line genomes. 

Total genomic DNA from one species has been used in in situ hybridization 
to discriminate in hybrid cells between chromosomes of that species and of a different 
species on the basis of the signal from the high copy repetitive sequences. [Pinkel et al., 
PNAS aiSAI. 83: 2934 (1986); Manuelidis, Hum. Genet.. 71: 288 (1985); and Dumam 
et al., Somatic Cell Mnl«c- Genet. . 11: 571 (1985).] Dmdegent et al., Hum, Genet,, 
77: 366-370 (1987), eliminated highly repetitive sequences, like Alu and Kpn fragments, 
from whole cosmid cloned genomic sequences by blocking the highly repetitive sequences 
with Cot-1 DNA. The resulting probe was used for in situ hybridization. 

European Patent Application Publication No. 430,402 (published June 5, 
1991) describes methods and compositions for chromosome-specific painting, that is, 
methods and compositions for staining chromosomes based upon nucleic acid sequence 
employing high complexity nucleic acid probes. In general in the chromosome-specific 
painting methods, repetitive sequences not specific to the targeted nucleic acid sequences 



are removed from the hybridization mixture and/or their hybridization capacity disabled, 
often by blocking with unlabeled genomic DNA or with DNA enriched for high copy 
repetitive sequences as is Cot-1 [commercially available from Bethesda Research 
Laboratory, Gaithersburg, MD (USA)]. Pinkel et aL. PNAS (USA), fi2i 9138-9142 
5 (1980) also describes aspects of chromosome-specific painting as well as International 
Publication No. WO 90/05789 (published May 31, 1990 entitled M in situ Suppression 
Hybridization and Uses Therefor"). 
q Chromosome-specific repeat sequence probes and chromosome-specific 

p painting probes can be hybridized in situ to interphase nuclei as well as metaphase 
10 ^; spreads and provide information about the genetic state of the individual targeted 
** t genomes, A limitation of such hybridizations is that cytogenetic information is only 
O provided from the regions to which the probes bind. Such hybridizations are very useful 
HP for determining if a particular abnormality is present, for example, the deletion of a 
rf specific gene or a duplication among other abnormalities, but it is laborious to search for 
15 currently unknown abnormalities on a region by region basis. 

Other methods of searching for unknown genetic abnormalities similarly 
require a lot of work. For example, looking for loss of heterozygosity in tumor cells, 
requires the hybridization of many probes to Southern blots of tumor and normal cell 
DNA. The instant invention, Comparative Genomic Hybridization (CGH), provides 
20 methods to overcome many of the limitations of the existing cytogenetic 
techniques. 

Saint-Ruf et al, t Genes, Chromosomes & Cancer. 2: 18-26 (1990) state at 

page 24 that 

25 Human breast carcinomas are characterized by two 

sets of molecular anomalies. Firstly, some protooncogenes, 
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such as MYCi TNT2 < HST . and ERBB2, are frequently 
found either amplified or overexpressed. . . . Secondly, loss 
of heterozygosity has been reported, especially for ip, II, 13 
and 17 . . . 

5 

Human breast carcinomas are also characterized 
cytogenetically by various anomalies that may be the chromosomal 
counterpart of the molecular anomalies: regions of amplification 
(HSRS) are found in more than one-third of the tumors .... and 
10 various deletions, affecting, e.g., lp, lip, Ilq, 13, and 17p, are 

found recurrently. . . . 

[Citations omitted.] Saint-Ruf et aL concluded from the reported experiments that 

O although amplification of genetic material is a frequent and probably important event in 

15 breast carcinogenesis, that the relevant genes involved in such amplifications remain 

as; j 

S unknown but do not seem to correspond to the proto-oncogenes commonly considered 
00 important in breast cancer. 

0 Since HSRs in tumors are most often not at the site of the amplified gene(s) 

v p. in normal cells, standard cytogenetics does not yield any information that could assist 
20 ji with identification of the gene(s). CGH on the other hand permits mapping them in the 
normal genome, a major step towards their identification. 

Dutrillaux et aL, Cancer Genet. Cvtogenet.. 49: 203-217 (1990) report (at 
page 203) that "[although human breast carcinomas are among the most frequent 
malignant tumors, cytogenetic data remain scarce, probably because of their great 
25 variability and of the frequent difficulty of their analysis/ In their study of "30 cases 

with relatively simple karyotypes to determine which anomalies occur the most frequently 
and, in particular, early during tumor progression* (p. 203), they concluded that "trisomy 
iq and monosomy 16q are early chromosomal changes in breast cancer, whereas other 
deletions and gain of 8q are clearly secondary events." [Abstract , p. 203.] Dutrillaux et 
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al. further state (at page 216) that deletions within tumor suppressor genes "characterize 
tumor progression of breast cancer." 

It is believed that many solid tumors, such as breast cancer, progress from 
initiation to metastasis through the accumulation of several genetic aberrations. [Smith et 
aL, Breast Cancer Res. Treat.. 18 SuppL 1: S 514 (1991); van de Vijver and Nusse, 
Biochim. Biophvs. Acta . 1072: 33-50 (1991); Sato et al., Canggr Re$. T ?Q: 71847189 
(1990).] Such genetic aberrations, as they accumulate, may confer proliferative 
; advantages, genetic instability and the attendant ability to evolve drug resistance rapidly, 
; and enhanced angiogenesis, proteolysis and metastasis. The genetic aberrations may 
; affect either recessive "tumor suppressor genes * or dominantly acting oncogenes. 
| Deletions and recombination leading to loss of heterozygosity (LOH) are believed to play 
i a major role in tumor progression by uncovering mutated tumor suppressor alleles. 

Dominandy acting genes associated with human solid tumors typically exert 
"! their effect by overexpression or altered expression. Gene amplification is a common 
mechanism leading to upregulation of gene expression. [Stark et aL, C?lL 7g; 901-908 
(1989).] Evidence from cytogenetic studies indicates that significant amplification occurs 
in over 50% of human breast cancers. [Saint-Ruf et al., supra.1 A variety of oncogenes 
have been found to be amplified in human malignancies. Examples of the amplification 
of cellular oncogenes in human tumors is shown in Table 1 below. 

TABLE 1 



Amplified 
Gene 



Tumor 



Degree of DM or HSR 
Amplification Present 



c-myc 



N-myc 



L-myc 
c-myb 
c-erbb 



Promyelocytic leukemia 20x 

cell line, HL60 

Small-cell lung 5-30x 

carcinoma cell lines 

Primary neuroblastomas 5-1000x 
(stages III and IV) 
and neuroblastoma 
cell lines 

Retinoblastoma cell 10-200x 
line and primary 
tumors 

Small-cell lung carcinoma 50x 
cell lines and tumors 

Small-cell lung carcinoma 10-20x 
cell lines and tumors 

Acute myeloid leukemia 5-10x 
Colon carcinoma cell lines lox 

Epidermoid carcinoma cell 30x 
Primary gliomas 



+ 
7 

+ 



7 
7 

7 



c-K-ras-2 



N-ras 



Primary carcinomas of lung, 4-20x 
colon, bladder, and rectum 

Mammary carcinoma cell 5-10x 
line 



SOURCE: modified from Varmus, Ann. Rev. Genetics. 18: 553-612 (1984) [cited in 
Watson et al M MolecuIaL Biology of the Gene (4th ed.; Benjamin/Cummings Publishing 
Co- 1987)] 

. Chromosomal deletions involving tumor suppressor genes may play an 
important role in the development and progression of solid tumors. The retinoblastoma 
tumor suppressor gene (Rb-1), located in chromosome 13ql4, is the most extensively 
characterized tumor suppressor gene (Friend et al., Nature . 323: 643 (1986); Lee et al., 
Science. 235: 1394 (1987); Fung et al., Science. 236: 1657 (1987)]. The Rb-1 gene 
product, a 105 kDa nuclear phosphoprotein, apparently plays an important role in cell 
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cycle regulation [Lee et al., sm2S (1987); Howe et al., PNAS (USA), 87; 5883 (1990)]. 
Altered or lost expression of-the Rb protein is caused by inactivation of both gene alleles 
either through a point mutation or a chromosomal deletion. Rb-i gene alterations have 
been found to be present not only in retinoblastomas [Friend et al., supra (1986); Lee et 
al., supra (1987); Fung et al., supra (1987)] but also in other malignancies such as 
osteosarcomas [Friend et al., supra (1986)], small cell lung cancer [Hensel et al., Cancer 
Res. . SQi 3067 (1990); Rygaard et al., Cancer Res.. 50: 5312 (1990)] and breast cancer 
[Lee et al., Science. 241: 218 (1988); T'Ang et al., Science. 242: 263 (1988); Varley et 
al., Oncogene. 4: 725 (1989)]. Restriction fragment length polymorphism (RFLP) studies 
have indicated that such tumor types have frequently lost heterozygosity at 13q suggesting 
that one of the Rb-1 gene alleles has been lost due to a gross chromosomal deletion 
[Bowcock et al., Am. J. Hum . Genet.. 46: 12 (1990)]. 

The deletion of the short arm of chromosome 3 has been associated with 
several cancers, for example, small cell lung cancer, renal and ovarian cancers; it has 
been postulated that one or more putative tumor suppressor genes is or are located in the 
p region of chromosome 3 (ch. 3p) [Minna et al., Symposia on Quantitative Biology . Vol. 
LI: 843-853 (SCH Lab 1986); Cohen et al., N. Eng. J. Med.. 301: 592-595 (1979); 
Bergerham et al., Cancer Res.. 49: 13901396 (1989); Whang-Peng et al., Can. Genet. • 
Cvtogenet.. TI: 91-106 (1984; and Trent et al., Can. Genet. Cvtoeenet.. 14: 153-161 
(1985)]. 

The above-indicated collection of amplified and deleted genes is far from 
complete. As the Saint-Ruf et al. study fsupral of oncogene amplification in cells 
showing cytogenetic evidence of amplification, such as double minutes (DMs) or 
homogeneously staining regions (HSRs), indicated, the amplified genes were not known 
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oncogenes in most cases. As Dutrillaux et aL, suora indicated, "cytogenetic data remains 
scarce" for "the most frequent malignant tumors " -breast carcinomas. 

Discovery of genetic changes involved in the development of solid tumors has 
proven difficult. Karyotyping is impeded by the low yield of high quality metaphases and 
the complex nature of chromosomal changes [Teyssier, J.R., Cancer Genet, Cvtogenet.. 
31: 103 (1989)]. Although molecular genetic studies of isolated tumor DNA have been 
more successful and permitted detection of common regions of allelic loss, mutation or 
amplification [Fearon et aL, Cell. 61: 759 (1990); Sato et aL, Cancer Res.. SO: 7184 
(1990); Alitalo et aL, Adv. Cancer Res.. 47: 235 (1986); and Schwab and Amler, Genes 
i Chrom, Cancer.. 1: 181 (1990)], such molecular methods are highly focused, targeting 
one specific gene or chromosome region at a time, and leaving the majority of the 
genome unexamined. 

Thus, a research tool leading to the identification of amplified and deleted 
genes and providing more cytogenetic data regarding tumors, especially tumor 
progression and invasiveness is needed in tumor cytogenetics. CGH provides such a 
molecular cytogenetic research tool. 

CGH facilitates the genetic analysis of tumors in that it provides a copy number 
karyotype of the entire genome in a single step. Regions of tumor DNA gain and loss are 
mapped directly onto normal chromosomes, comparisons of primary tumors with their 
metastases by CGH should be informative concerning cancer progression. 

The ability to survey the whole genome in a single hybridization is a distinct 
advantage over allelic loss studies by restriction fragment length polymorphism (RFLP) 
that target only one locus at a time. RFLP is also restricted by the availability and 
informativeness of polymorphic probes. 

. 14 - 



The copy number karyotype determined by CGH may become as important for 
diagnostic and/or prognostic assessment of solid tumors as conventional karyotyping now 
is for hematologic malignancies. [Yunis, LJ., Science , 221: 227 (1983); Solomon et aL, 
Science. 254: 1153 (1991).] 

5 

SUMMARY OF THE INVENTION 

Comparative Genomic Hybridization (CGH) employs the kinetics of in situ 
O hybridization to compare the copy numbers of different DNA or RNA sequences from a 
E sample, or the copy numbers of different DNA or RNA sequences in one sample to the 
10 ttl copy numbers of the substantially identical sequences in another sample* In many useful 
00 applications of CGH, the DNA or RNA is isolated from a subject cell or cell population. 
H The comparisons can be qualitative or quantitative. Procedures are described that permit 
% determination of the absolute copy numbers of DNA sequences throughout the genome of 
M< a cell or cell population if the absolute copy number is known or determined for one or 
15 several sequences. The different sequences are discriminated from each other by the 

different locations of their binding sites.when hybridized to a reference genome, usually 
metaphase chromosomes but in certain cases interphase nuclei. The copy number 
information originates from comparisons of the intensities of the hybridization signals 
among the different locations on the reference genome. 
20 Two representative basic approaches are employed in CGH as illustrated herein 

for the analysis of subject DNAS. In an example of the first approach, genomic DNA 
from a subject cell or cell population of cells is isolated, labeled and hybridized to 
reference chromosomes, usually in metaphase. In an example of the second approach, 
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genomic DNAs from two or more subject cells or cell populations are isolated, 
differentially labeled, and hybridized to reference chromosomes, usually in metaphase! 

The CGH methods of this invention can be qualitative and/or quantitative. A 
particular utility of CGH is for analysing DNA sequences from subject cell(s) or cell 
populations, for example from clinical specimens including tumor and fetal tissues. 

An important utility of CGH is to find regions in normal genomes which when 
altered in sequence copy number contribute to disease, as for example, cancer or birth 
defects. For example, regions at elevated copy number may contain oncogenes, and 
regions present at decreased copy number may contain tumor suppressor genes. 

A representative CGH method is for comparing copy numbers of different 
DNA sequences in a subject cell or cell population comprising the steps of: 

a) extracting the DNA from the subject cell or from a number of cells of the 
subject cell population; 

b) amplifying said extracted subject DNA, if necessary; 

c) labeling the subject DNA; 

d) hybridizing said labeled subject DNA in situ to reference metaphase 
chromosomes after substantially removing from the labeled DNA those repetitive 
sequences that could bind to multiple loci in the reference metaphase chromosomes, 
and/or after blocking the binding sites for those repetitive sequences in the reference 
metaphase chromosomes by prehybridization with appropriate blocking nucleic acids, 
and/or blocking those repetitive sequences in the labeled DNA by prehybridization with 
appropriate blocking nucleic acid sequences, and/or including such blocking nucleic acid 
sequences for said repetitive sequences during said hybridization, wherein the DNA 
sequences in the labeled subject DNA that bind to single copy sequences in the reference 
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metaphase chromosomes are substantially retained, and those single copy DNA sequences 
as well as their binding sites in the reference metaphase chromosomes remain 
substantially unblocked both before and during the hybridization; 

e) rendering the bound, labeled DNA sequences visuaiizable, if necessary; 

f) observing and/or measuring the intensity of the signal from the labeled 
subject DNA sequences as a function of position on the reference metaphase 
chromosomes; and 

g) comparing the copy numbers of different DNA sequences of the subject 
DNA by comparing the signal intensities at different positions on the reference metaphase 
chromosomes, wherein the greater the signal intensity at a given position, the greater the 
copy number of the sequences in the subject DNA that bind at that position. An 
analogous method can be performed wherein the subject nucleic acid is RNA. 

Further, disclosed are methods wherein two or more subject nucleic acids are 
analysed by CGH. Exemplary methods are those wherein the subject nucleic acids are 
DNA sequences from a subject cell or cell population. Analogous methods may be 
performed wherein the subject nucleic acids are RNA. Such an exemplary method is that 
for comparing copy numbers of different DNA sequences in one subject cell or cell 
population relative to copy numbers of substantially identical sequences in another cell or 
cell population, said method comprising the steps of: 

a) extracting the DNA from both of the subject cells or cell populations; 

b) amplifying said extracted subject DNAS, if necessary; 

c) differentially labeling the subject DNAS; 

d) hybridizing said differentially labeled subject 
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DNAs in situ to reference metaphase chromosomes after substantially removing 
from the labeled DNAs those repetitive sequences that could bind to multiple loci in the 
reference metaphase chromosomes, and/or after blocking the binding sites for those 
repetitive sequences in the reference metaphase chromosomes by prehybridization with 
appropriate blocking nucleic acids, and/or blocking those repetitive sequences in the 
labeled DNA by prehybridization with appropriate blocking nucleic acid sequences, 
and/or including such blocking nucleic acid sequences for said repetitive sequences during 
said hybridization; 

e) rendering the bound, differentially labeled DNA sequences visualizable, if 

necessary; 

f) observing and/or measuring the intensities of the signals from each subject 
DNA, and the relative intensities, as a function of position along the reference metaphase 
chromosomes; and 

g) comparing the relative intensities among different locations along the 
reference metaphase chromosomes wherein the greater the intensity of the signal at a 
location due to one subject DNA relative to the intensity of the signal due to the other 
subject DNA at that location, the greater the copy number of the sequence that binds at 
that location in the first subject cell or cell population relative to the copy number of the 
substantially identical sequence in the second subject cell or cell population that binds at 
that location. 

Further disclosed are methods of quantitatively comparing copy numbers of 
different DNA sequences in one subject cell or cell population relative to copy numbers 
of substantially identical sequences in another subject cell or cell population. A 
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rcpresentatiYe method is that comprising steps (a) through (e) of the method immediately 
detailed above and the following steps of: 

f. measuring the intensities of the signals from each of the bound subject 
DNAs and calculating the ratio of the intensities as a function of position along the 
reference metaphase chromosomes to form a ratio profile; and 

g. quantitatively comparing the ratio profile among different locations along 
the reference metaphase chromosomes, said ratio profile at each location being 
proportional to the ratio of the copy number of the DNA sequence that bind to that 
location in the first subject cell or cell population to the copy number of substantially 
identical sequences in the second cell or cell population. 

Said representative methods can further comprise comparing copy numbers of 
different DNA sequences in more than two subject DNAs wherein the comparing is done 
pairwise between the signals from each subject DNA. 

This invention further discloses methods to determine the ratio of copy numbers 
of different DNA sequences in one subject cell or cell population to copy numbers of 
substantially identical sequences in another cell or cell population wherein the steps of (a) 
through (f) as described above are performed as well as the following steps: 

g. determining the average copy number of a calibration sequence in both 
subject cells or cell populations, said calibration sequence being substantially identical to 
a single copy sequence in the reference metaphase cells; and 

h. normalizing the ratio profile calculated in (f) so that at the calibration 
position, the ratio profile is equal to the ratio of the average copy numbers determined in 
(g), the normalized ratio profile at any other location along the reference metaphase 
chromosomes thereby giving the ratio of the copy numbers of the DNA sequences in the 
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two subject DNAs that bind at that location. That method can be extended to further 
subject nucleic acids as for example determining the ratio of copy numbers of DNA 
sequences in more than two subject DNAs wherein the comparing is done pairwise 
between signals from each subject DNA. 

Further disclosed are methods for comparing copy numbers of different DNA 
sequences in a test cell or cell population, said method comprising applying steps (a) 
through (e) of the above-described methods and 

f. observing and/or measuring the intensities of the signal from each subject 
DNA, and the relative intensities, as a function of position along the reference metaphase 
chromosomes wherein one of the subject cells or ceil populations is the test cell or cell 
population and the other is a normal cell or cell population; and 

(g) comparing the relative intensities among different locations along the 
reference metaphase chromosomes, wherein the greater the relative intensity at a location, 
the greater the copy number of the sequence in the test cell or cell population that binds 
to that location, except for sex chromosomes where the comparison needs to take into 
account the differences in copy numbers of sequences in the sex chromosomes in relation 
to those on the autosomes in the normal subject cell or cell population. 

A related representative method is that for comparing the copy number of 
different DNA sequences in a test ceil or cell population comprising applying steps (a) 
through (e) of the above described methods wherein one of the subject cells or cell 
populations is the test cell or cell population, and the other is a standard cell or cell 
population wherein the copy numbers of the DNA sequences that bind to different 
positions on the reference metaphase chromosomes is known and steps: 
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f. measuring the intensities of the signals from each of the bound subject 
DNAs and calculating the ratio of intensities as a function of position along the reference 
metaphase chromosomes to form a ratio profile; 

g. adjusting the ratio profile at each location along the reference metaphase 
chromosomes by multiplying the ratio profile by the known copy number of DNA 
sequences in the standard cell or cell population that bind there; and 

h. comparing the adjusted ratio profiles at different locations along the 
reference metaphase chromosomes wherein the greater the adjusted ratio profile at a 
location, the greater the copy number of the DNA sequence in the test cell or cell 
population that binds there. 

Another related representative method is that for determining the ratios of the 
copy numbers of different DNA sequences in a test cell or cell population, said method 
comprising applying steps (a) through (f) of, the immediately above-described method and 
the steps of adjusting the ratio profile at each location along the reference metaphase 
chromosomes by multiplying the ratio profile by the known copy number of sequences 
that bind there; and calculating the ratio of the copy number of a DNA sequence in the 
test cell or cell population that binds to one location on the reference metaphase 
chromosomes to the copy number of a sequence that binds to another location by dividing 
the adjusted ratio profile at the location of the first sequence by that at the location of the 
second. Said representative method can be extended to determine the copy number of 
different DNA sequences in a test cell or cell population wherein steps (a) through (0 as 
described above are followed and then the following steps of adjusting the ratio profile at 
each location along the reference metaphase chromosomes by multiplying the ratio profile 
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by the known copy number of DNA sequences in the standard cell or cell population that 
bind there; 

determining the copy number of a calibration sequence in the test cell or ceil 
population that is substantially identical to a single copy sequence in the reference cells; 
and 

normalizing the adjusted ratio profile so that at the location of the calibration 
sequence on the reference metaphase chromosomes, the normalized, adjusted ratio profile 
is equal to the copy number of the calibration sequence determined in the above step, the 
value of the normalized, adjusted ratio profile at another location then being equal to the 
copy number of the DNA sequence in the test cell or cell population that binds at that 
location. That method can be analogously performed wherein two or more calibration 
sequences are used, and the adjusted ratio profile is normalized to get the best fit to the 
copy numbers of the ensemble of calibration sequences. Preferably, the copy number of 
the calibration sequence is determined by in situ hybridization. Those methods can 
comprise in situ hybridizing probes for more than one calibration position and 
normalizing to obtain the best fit of the ratio profile to the calibration positions. The 
standard cell or cell population preferably have normal genomes. In many applications of 
CGH, the reference metaphase chromosomes are normal. 

Further, this invention concerns the use of antenna ceil lines. An exemplary 
method is for detecting amplification of a certain sequence or group of sequences in a 
subject cell or cell population, comprising essentially steps (a) through (e) of the above- 
described methods wherein the in situ hybridization is targeted to antenna cells in which 
the DNA sequences to be tested for is or are amplified, and examining the reference cell 
for regions that are hybridized significantly more intensely than others, the presence of 
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such regions indicating amplifications of the sequences which are being tested. The 
chromosomes of said antenna cell lines may be in interphase or in metaphase. 

When a single labeled subject nucleic acid is being hybridized, or if multiple 
labeled subject nucleic acids are hybridized sequentially, it is important that the binding 
sites on the reference genome not be saturated prior to observing and/or measuring the 
signal in tensity (ies). In the case of a single labeled subject nucleic acid, nonsaturation 
can be effected in a number of ways, for example, by stopping the hybridization, by 
providing insufficient subject nucleic acid, and/or by providing a sufficient amount of 
unlabeled nucleic acid which is sufficiently complementary to the reference chromosomes 
to competitively prevent saturation of sites therein by the labeled subject nucleic acid. 

When there are two or more labeled subject nucleic acids, those subject nucleic 
acids can be hybridized in situ to the reference genome sequentially or simultaneously, 
simultaneous in situ hybridization is preferred in that saturation of the targeted binding 
sites in the reference genome will not interfere with the procedure. When sequential in 
situ hybridization is used, it must be performed under conditions wherein the individual 
hybridizations are stopped well before the binding sites on the reference chromosomes are 
saturated. 

Objects of this invention are to detect sequence copy number imbalances 
throughout an entire genome in one hybridization, to map gains and/or losses of 
sequences in a genome, and/or to provide a copy number karyotype of a subject genome. 

Further, an object of this invention is to enable the detection of relative copy 
number differences that are common to a number of different cells and/or cell 
populations. For example, CGH methods can be used wherein DNAs extracted from 
cells of many different tumors are combined and labeled; the hybridization of those 
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combined labeled DNAs to normal condensed chromosomes, provides for the rapid 
identification of only those copy number changes that occurred in most of the tumors. 
Less frequently occurring variations would be averaged out. Thus, this invention further 
provides for a CGH method wherein two or more of the subject nucleic acids that were 
extracted from different cells and/or from numbers of cells from different cell 
populations, are labeled the same, and hybridized to a reference spread under conditions 
wherein repetitive sequences are removed and/or suppressed and wherein sequence copy 
number differences that are common in said combined labeled nucleic acid sequences are 
determined. 

Another object of this invention is to provide the means of cytogenetically 
analysing archived chromosomal material, that is, fixed material from, for example, 
biopsied tissue specimens, preferably cataloged and keyed to medical records of patients 
from whom the specimens were taken, and archaeological chromosomal material. Such 
chromosomal material cannot, of course, be karyotyped according to traditional means in 
that no live cells are present to culture and from which to prepare chromosomal spreads. 
However, the nucleic acid can be extracted therefrom and amplified by a polymerase 
chain reaction (PCR) procedure or by a non-PCR procedure and tested by the methods of 
this invention. 

This invention further provides for a method to detect simultaneously an 
ensemble of amplifications and/or deletions in a tumor wherein the results can be used to 
determine the subsequent behavior of that tumor. Said determination is made by 
associating the patterns of amplifications and/or deletions in tumor cells with the behavior 
of that tumor. Such associations can be made by testing, for example, as indicated 
immediately above, DNA from archived tumor tissue keyed to medical records, or when 
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fresh tumor specimens are tested by CGH and the patients are followed. Further, such 
associations can be made with CGH methods wherein there are more than one subject cell 
and/or cell population, for example, one or more tumors. 

Another object of this invention is to provide a method of analyzing cells from 
a suspected lesion at an early stage of development. An advantage of the methods of this 
invention is that only a few cells are necessary for the analysis. The early detection of 
amplifications and/or deletions in cells from a lesion allow for early therapeutic 
intervention that can be tailored to the extent of, for example, invasiveness known to be 
associated with such genetic rearrangements. Further, such early detection provides a 
means to associate the progression of the cells with the genetic rearrangements therein 
detected by the methods of this invention. 

Tumors can be karyotypically heterogeneous containing therein various 
populations of cells each having different types of genetic rearrangements. As indicated 
above tumor cells are difficult to culture, and it is not clear that cultured cells are- 
representative of the original tumor cell population. This invention provides the means to 
by-pass the culturing obstacle and allows genetic characterization of tumor cells and thus, 
of the heterogeneity of tumors by testing cells from different subregions thereof according 
to the methods of this invention. Bulk extraction of the nucleic acid from many cells of a 
tumor can also be used to test for consistent amplifications and/or deletions within a 
tumor* 

It is another object of this invention to provide methods of detecting 
amplifications and/or deletions of nucleic acid sequences wherein certain cell lines termed 
herein "antenna cell lines", are used to enhance the sensitivity of the detection. 
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It is still further an object of this invention to provide methods of prenatal or 
perinatal analysis wherein the nucleic acid of the child's cells is extracted and tested 
according to the methods of this invention. In one embodiment of CGH, such material is 
human and hybridized to a normal human metaphase spread to detect whether any 
deletions and/or amplifications are therein present, for example, an extra copy of 
chromosome 21, diagnostic for Down syndrome. Test kits for performing CGH methods 
are also provided. 

BRIEF DESCRIPTION OF THE FIGURES 

Figure 1 illustrates the results of a CGH hybridization of DNA from the BT- 
474 human breast cancer cell line to a metaphase spread of normal peripheral 
bloodlymphocyte human chromosomes. The BT-474 cell line is known to have a 13-fold 
c-erbB-2 amplification. The DNA from that cell line was labeled with digoxigenin-11- 
dUTP and stained with fluorescein isothiocyanate (FITC); signals from the hybridization . 
of the cell line DNA are green in the photomicrograph. A chromosome 17 peri- 
centromeric repeat probe (cosmid cK17.10) was labeled with biotin-14-dATP and stained 
with Texas Red; signals from that probels hybridization are red. The chromosomal DNA 
was counterstained with 4,6-diamidino-2-phenylindole (DAPI) resulting in a blue 

counterstaining. The photomicrograph was taken using a multicolor image analysis 

system after contrast stretching and pseudocolor display. 

The green signals indicating amplified sequences in the BT-474 cell line are 

seen in Figure 1 at the following loci: 17ql2 (the erbB-2 locus), 17q22-q23 and 2oql3- 

ter. The latter two sites were previously unrecognized sites of amplification in that cell 

line, one centromeric repeat is non-specifically stained green. 
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Figure 2 schematically illustrates the general approach used in performing the 
methods of this invention-Comparative Genomic Hybridization (CGH). The reference 
chromosome spread is hybridized with various nucleic acid mixtures, either 
simultaneously or at different times, to obtain the desired information. Representative 
mixtures could include unlabeled sequences designed to block sequences in the various 
other nucleic acid pools, for example, the high-copy repetitive sequences in human 
genomic DNA; unlabeled competitor nucleic acid to prevent saturation of the target sites 
for the labeled mixtures, for example, human genomic DNA within a factor of 10 of the 
concentration used for the labeled subject nucleic acids (see Figure 5); and one or more 
pools of sequences of different origin that are differently labeled so that their binding can 
be independently assessed, for example, tumor and normal genomic DNA (see Figures 6 
and 7). The information on the sequence frequency of the labeled pools is obtained by 
analysis of the intensity of the individual signals and/or the differences in ratios of 
intensities among the signals as a function of position along the reference chromosomes. 

Figure 3 outlines general aspects of the CGH procedure used in Example 1, 
infra. The reference chromosome spread, in this example normal human chromosomes, 
is first hybridized for about one hour with a high concentration of unlabeled human 
genomic DNA (Figure 3a). That prehybridization blocks many of the high copy 
repetitive sequences in the chromosomes so that the high copy repetitive sequences in the 
labeled subject nucleic. acid, in this case labeled tumor DNA, will not substantially 
contribute to the signal during the subsequent hybridization. The labeled tumor DNA, 
and perhaps some competitor DNA or other comparison nucleic acid are then hybridized 
to the target reference spread (Figure 3b). Cotl DNA can be included in the 
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hybridization as in Example 1, below to block more effectively the centromeric repetitive 
sequences in the labeled subject nucleic acids. 

Figure 3 is representative of one way of reducing signals from repetitive 
sequences. Other methods are detailed herein infra. In each of the CGH methods 
including the procedures outlined in the rest of the figures, some means of reducing the 
signal from the repetitive sequences is used, but not specifically indicated in the figures. 
It is important for CGH that the signal from each subject nucleic acid be dominated by 
sequences that bind to well defined loci. Total suppression of the signal from the 
genomic repeats is not necessary, but the poorer the suppression, the less able the 
procedure is to detect small differences in sequence frequency. 

Fi gure 4 illustrates the procedure used in Example 1 , for which representative 
results are shown in the photomicrographs of Figures 1 and 8, As shown in Figure 4a, 
labeled human tumor DNA is hybridized to a normal human chromosome spread. [Please 
note as indicated in the description for Figure 3, provisions were made to suppress the 
signal from the repetitive sequences although those provisions are not specifically 
indicated in the figure* Example 1 details one preferred method to suppress the 
hybridization signals from repetitive sequences.] In this representative example, the tumor 
DNA is assumed to contain a region wherein some sequences are highly amplified, for 
example, an amplicon containing an oncogene. The amplified sequences in the tumor 
DNA may be clustered and integrated in some tumor chromosomes; they may be 
integrated into multiple places in the tumor genome; or, they may exist as extra- 
chromosomal elements. The sequences of the amplicon will map to some chromosomal 
location in the reference genome, which in this case is a normal human genome. 
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The lower portion of Figure 4 illustrates the kinetics of the build-up of the 
signal on a target reference chromosome. The signal builds more rapidly in the amplified 
region since more copies of those sequences are available for hybridization. If the 
reaction is stopped before the target chromosome is saturated, or if insufficient labeled 
DNA is added to achieve saturation, then the genomic region that was amplified in the 
tumor will appear higher in intensity on the normal chromosome as illustrated by the dark 
band on the left reference chromosome. The more intensely labeled region (dark band) 
indicates the location and extent of the amplicon as reflected in the reference genome. 
Thus, the amplification is detected without prior knowledge of its existence, and the 
origin of the amplified sequences is mapped in the normal human genome. 

If the reaction illustrated in Figure 4 is allowed to proceed to saturation of the 
target sites, contrast is lost, as shown by the representative reference chromosome on the 
right. Thus, in this embodiment of CGH, it is important to stop the hybridization before 
saturation of the target or provide insufficient probe for saturation. The graphs 
schematically show the build-up of the hybridization signal in the region that was 
amplified (graph on right) and in the remainder that was unamplified (graph on left). The 
arrows connect the chromosomal regions with the times of observation, on the kinetic 
curve. 

Figure 5 illustrates an embodiment of CGH that avoids the potential saturation 
of the target as shown in the lower right portion of Figure 4. In this representative 
example, the reference nucleic acid is a human chromosome spread; the subject nucleic 
acid is labeled tumor DNA. If unlabeled human genomic DNA is included with the 
labeled tumor DNA in excess, in this case at a five-fold higher concentration than that of 
the labeled tumor DNA, then any saturation of the target will be due to a combination of 
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labeled and unlabeled copies of the nucleic acid sequences, rather than just labeled copies 
as shown in the lower right of Figure 4. [once again, as indicated in Figures 3 and 4 the 
means of reducing the signal from repetitive sequences is not indicated in this figure, but 
it is assumed that some protocol is performed to remove substantially the repetitive 
sequences that would bind to multiple loci in the reference genome and/or to block such 
sequences from binding to the target.] 

At the early stages of the reaction, the amplified region will build up faster 
than elsewhere in the chromosome (for example if the sequence is amplified five-fold, it 
would build up 5 times as fast) and will be detectable as in the lower left of Figure 4. 
However as the reaction proceeds to saturation, the unampiified regions of the 
chromosome reach only one-fifth (1/5) of the intensity shown in Figure 4, because most 
of the sites are filled by unlabeled copies of the sequences. On the other hand, a 
sequence that was amplified five-fold in the tumor would reach one-half (1/2) of the 
saturation intensity since an equal number of labeled and unlabeled copies of those 
sequences are present. Thus, contrast is maintained according to this embodiment at all 
stages of the reaction, although it changes as the reaction proceeds. 

Figure 6 illustrates an embodiment of CGH designed to enhance its sensitivity 
in detecting small changes in copy number of various sequences. When a CGH 
procedure as indicated in Figure 5 is followed, intrinsic variation in the saturation levels, 
or rate of signal build-up at different positions in the reference genome may not be 
indicative of abnormal gain or loss of sequences. Such intrinsic variations would 
interfere with interpretation of intensity differences as indicating differences in copy 
number of the sequences. This CGH embodiment overcomes that potential problem by 
providing a mixture of labeled subject nucleic acid, in this case tumor DNA labeled with 
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a green fluorochrome, and a differently labeled competitor nucleic acid in this case 
normal human genomic DNA labeled with a red fluorochrome. The two differently 
labeled DNAs are simultaneously hybridized to the chromosome spread. [Once again, 
removal of the repetitive sequences and/or blocking of the signal therefrom is performed 
but not illustrated.] Changes in the ratio of green to red along each of the chromosomes 
in the reference spread then indicate regions of increased or decreased sequence copy 
number in the tumor. Those ratio changes may result in color variations from red to 
yellow to green on the reference spread. 

Figure 7 graphically and schematically explains the kinetics underlying the 
CGH embodiment illustrated in Figure 6. In the center is one of the chromosomes of the 
reference chromosome spread, a normal human chromosome in this case. The darkness 
of the shading on the reference chromosome shows the ratio of green to red intensity 
along the chromosome. 

In the amplified region, the green/red ratio is much higher than in the normal . 
region, whereas in the deleted region the green/red ratio is less than in the normal region. 
The arrows from examples of each of the different green/red intensity regions point to 

0 

kinetic curves that indicate the build-up of green (solid line for the tumor DNA) and red 
(dashed line for the normal DNA) signals during the hybridization. In the normal region, 
upper left graph, the red and green signals build together. (They have been normalized to 
be equal for the purposes of this explanation.) In the amplified region, upper right, the 
green (tutor) signal builds up much more rapidly than the red (normal) signal, the 
green/red ratio being approximately the level of amplification (given the normalization to 
the normal part of the chromosome). 



In the lower left of Figure 7, the signal build-up for the duplicated region is 
shown; the green (tumor) signal is 50% brighter than the red (normal) signal. In the 
lower right, the build-up for a deleted region is schematically described; the green 
(tumor) signal is 50% dimmer than the red (normal) signal. The ratio approach of this 
CGH embodiment further normalizes for the frequent finding that hybridization to some 
chromosomes in a spread is intrinsically brighter than that for others because of 
differences in the local hybridization environment. 

Figure 8 . illustrates an example of how a deletion can be detected using CGH. 
A deletion is simulated by employing DNA from a human primary breast carcinoma (XX) 
as a subject genome and a normal male chromosome spread (XY) as the reference 
genome. The absence of the Y-chromosome in the tumor DNA was detected, as would a 
cytogenetically significant deletion, by the hybridization. DNA from the primary breast 
carcinoma was labeled with digoxigenin-llDUTP and stained with fluorescein 
isothiocyanate (FITC) (green signals). The normal male peripheral blood lymphocyte 
metaphase was counterstained with 4 ) 6-diamidino2-phenylindole (DAPI) (blue). The 
picture was taken from a multicolor image analysis system (QUIPS) after image 
thresholding and contrast stretching. The green chromosomal fluorescence level on all 
chromosomes was increased to make the absence of this fluorescence on the Y- 
chromosome (arrow) more readily visible. The Y-chromosome is only stained with the 
DAPI counterstain. 

Figure 9 presents an idiogram of chromosome 1 from the breast cancer cell line 
600 MPE, the karyotype for which was published by Smith et al., iNC!. 78: 611*615 
(1987). 
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Figure 10A is a photomicrograph showing the comparative genomic 
hybridization (CGH) of DNA from a 45, XO cell line (green) and a normal human female 
DNA (red) to a normal human male reference spread. The reddish color of the X 
chromosome, pointed out by the large arrow, as compared with the autosomes reflects the 
lower relative copy number of the X chromosome sequences in the 45, XO cell line. 
Faint staining of a small part of the Y chromosome, pointed out by the small arrow, is a 
result of the binding of homologous sequences in the pseudo-autosomal region. 

Figure 10B graphically illustrates the correlation of the number of X 
chromosomes in five fibroblast cell lines and the average green-to-red ratio of the X 
chromosomes) relative to the same ratio for the autosomes. 

Figure 11 illustrates green-to-red fluorescence ratio profiles of chromosomes 1, 
9, 11, 16 and 17 after comparative genomic hybridization with breast cancer cell line 
60OPE (green) and with a normal DNA (red). The profiles reflect the relative copy 
number of the chromosomal regions. Fluorescence in situ hybridization (FISH) with 16p 
and 16q cosmid probes to interphase and metaphase 60OPE cells indicated that there were 
two signals with 16p cosmid probes and one signal from the 16q cosmid probes. That 
information on the absolute copy number of those loci provided by FISH permits 
interpretation of the ratio 1.0 as indicating that there are two copies of the sequence 
throughout the genome. 

The dip in the profile at lp34 through lp36 may represent a previously 
unsuspected small interstitial deletion; however, that observation has not yet been 
independently verified with specific probes for that region. 

Centromeric and heterochromatic regions of the genome are not included in the 
analysis because the Cot-1 DNA partially blocks signals in those regions, and the large 
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copy number polymorphisms between individual sequences at those loci effect unreliable 
ratio data. 

Figure 12(A) and 12(B) respectively provide green-to-red fluorescence ratio 
profiles of chromosome 8 (A) and chromosome 2 (B) after comparative genomic 
hybridization respectively with COLO 320 HSR (human colon adenocarcinoma cell line) 
and NCI H69 (small cell lung carcinoma cell line) cell line DNAs (green) and with 
normal human DNA (red). The inserts illustrate the overlaid green and red fluorescence 
images of the chromosomes, and the chromosomal medial axis drawn by the image 
analysis program used. 

In Figure 12(A), the myc locus at 8q24 shows a highly elevated green-to-red 
ratio, which is consistent with the known high level amplification of myc in the COLO 
320HSR cell line. 

In Figure 12(B), three regions of amplification are seen on chromosome 2. The 
signal at 2p24 corresponds to the location of N-myc known to be amplified in the NCI- 
H69 cell line. The two other regions with a highly increased green-to-red fluorescence 
ratio, at 2p21 and 2q21, were not previously known to be amplified in the NCI-H69 ceil 
line. 

Figure 13 is a photomicrograph of a comparative genomic hybridization (CGH) 
with BT-20 (breast cancer cell line) cell line DNA (green) and normal DNA (red) to a 
normal human metaphase spread. Loss of DNA sequences in the tumor cell line DNA 
relative to normal DNA are shown by red whereas gain of DNA sequences in the tumor 
cell line are shown in green. 

Figure 14 . Quantitation of green to red fluorescence intensities along the two 
homologues of chromosomes 1, 8, 10, 16 and 20 after CGH with primary breast cancer 
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DNA in green and normal DNA in red. An increased ratio is observed at 8q (high-level 
gain) and at lq (low-level gain). Regional copy number increases at 10q22 and at 20ql2- 
ql3 are also evident, whereas chromosome 16 shows no changes. The ratios are 
normalized so that the average green to red ratio for each metaphase cell is 1.00. Ratio 
changes at repeat-rich heterochromatic regions are not reliable and are displayed with 
dotted lines. The chromosome diagrams below each ratio profile are shown only for 
approximate visual comparison and were not used for localizing the changes. 

Figure 15 , Chromosomal localization of DNA sequence copy number increases 
in 33 primary breast tumors (left side of the chromosome diagrams) and 15 breast cancer 
cell lines (right). Low-level copy number increases are shown in blue and high-level in 
red. The chromosomal band location of the changes was determined directly based on the 
DAPI banding for each chromosome. Because high-resolution sub-band localization was 
not possible using DAPI staining, the regions shown in the figure are often larger than the 
actual size of the amplicon. 

Figure 16 . shows the gains and losses of DNA sequences in primary bladder 
carcinomas. 

Figure 17 . summarizes the CGH and LOH data for ovarian cancers by 
chromosome arm in Figure 16 for 3C Grade EI tumors. 

Figure 18 , shows gene dosage abnormality detected by CGH analyses of 30 
grade IH ovarian cancers. 

Figure 19 . shows a schematic illustration of a model of progression in which 
tumor progress as a result of accumulation of genetic aberrations, some of which confer 
the same phenotype. 
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Figure 20 . shows: Panel a. chromosomal locations of cosmid probes mapped 
along chromosome 20. Panel b. Level of amplification BT474 along chromosome 20 in 
interphase nuclei determined using FISH with mapped probes. 

DETAILED DESCRIPTION 

Comparative Genomic Hybridization (CGH) has also been termed Copy Ratio 
Reverse Cytogenetics (CRRC), competition hybridization and quantitative in situ ratio 
karyotyping (QUIRK). Further, in the embodiment wherein fluorochromes are used as 
labels, it has been termed competition FISH (fluorescence in situ hybridization). CGH 
specifically provides methods whereby amplifications, duplications and/or deletions can be 
identified in an immediate overview of a genome. 

CGH provides methods for determining variations in the copy number of 
different elements in a mixture of nucleic acid sequences (for example, genomic DNA 
isolated from a tumor) as a function of the location of those sequences in the genome of a 
reference organism (for example, the genome of a normal cell from the same species). 
The methods comprise the use of in situ hybridization of the nucleic acid sequence 
mixture to a chromosome spread of the reference organism, and measuring the intensity 
of the hybridization at different locations along the target chromosomes. Exemplary 
methods are schematically outlined in Figures 2-7. Those illustrative examples are not 
exhaustive but suggest the wide range of variations and other uses of the basic approach. 

As the figure descriptions indicate, it is critical that signals from repetitive 
sequences do not dominate the signal from the subject nucleic acid pool, and that they be 
removed from the pool or that their signals be suppressed as necessary. It is preferred to 
exclude sequences from the hybridization or block sequences in the hybridization mixture 
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that could bind to multiple clearly separated positions on the chromosomes, for example, 
sites that are on different chromosomes, or that are on the same chromosome but are 
well-separated. In many applications of CGH, it is the high copy repetitive sequences, 
such as Alu, Kpn, Lines, and alpha-satellites among others, that are removed from the 
labeled subject nucleic acid and/or which are blocked and/or the binding sites therefor are 
blocked. Described herein are methods to remove and/or block those repetitive signals. 
It should be noted that nucleic acid sequences in the labeled nucleic acid that bind to 
single copy loci are substantially retained in the hybridization mixture of labeled subject 
nucleic acids, and such single copy sequences as well as their binding sites in the 
reference chromosome spread remain substantially unblocked relative to the repetitive 
sequences that bind to multiple loci (that is, loci that are visually distinguishable) both 
before and during the hybridization. 

The methods of this invention provide the means to identify previously 
unknown regions of amplification and deletion. For example, one embodiment of CGH 
as detailed in Example 1 herein provides an efficient method that gives an immediate 
overview of a genome identifying all regions that are amplified greater than about five* 
fold to ten-fold as well as at least large deletions. More sensitive embodiments that can 
identify smaller amplifications and deletions are also disclosed. 

Nanogram quantities of the subject nucleic acids are required for the CGH 
methods of this invention. Paraffin embedded tumor sections can be used as well as fresh 
or frozen material. Snap frozen material from normal and malignant tissue are preferred 
for MRNA isolation. 

Standard procedures can be used to isolate the required nucleic acid from the 
subject cells, however, if the nucleic acid, for example, DNA or MRNA, is to be 
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extracted from a low number of cells (as from a particular tumor subregion) or from a 
single cell, it is necessary to amplify that nucleic acid, by a polymerase chain reaction 
(PCR) procedure or by a non-polymerase chain reaction (nonPCR) procedure. PCR and 
preferred PCR procedures are described infra. Exemplary non-PCR procedures include 
the ligase chain reaction (LCR) and linear amplification by use of appropriate primers and 
their extension (random priming). 

Some of the various embodiments of CGH are illustrated, particularly in 
Figures 2-7. In the embodiment illustrated in Figures 6 and 7, wherein a subject nucleic 
acid, in this case, human genomic DNA, that is labeled differently from another subject 
nucleic acid, amplifications and/or deletions are indicated by a change in ratio between 
the different signals, rather than just a change in signal intensity. 

The representative examples concerning CGH of Examples 1, 2 and 3 below 
involve the hybridizations of tumor cell line DNA to normal human metaphase spreads. 
However, there are many permutations and combinations of pairwise and multiple 
hybridizations of different nucleic acids from different genomes all of which are 
considered to be within the scope of this invention. 

For example, CGH could be used to hybridize labeled DNA from a tumor cell 
line to metaphase spreads of that same cell line to estimate the level and pattern of 
amplification in each cell line, comparing those results to hybridizations of said tumor cell 
line DNA to a normal human metaphase spread. Alternatively, labeled tumor cell line 
DNA and differently labeled human genomic DNA could be simultaneously hybridized to 
a metaphase spread of a tumor cell line metaphase spread. Further, DNA from a primary 
tumor and that from its metastasis could be differently labeled and hybridized in a CGH 
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method to a normal human metaphase or to a related tumor cell line metaphase. Those 
are just some of the many examples of CGH. 

Although the examples herein concern the hybridizations of the DNA from 
breast cancer cell lines and primary tumors to normal human metaphase spreads, it will 
be clear to anyone skilled in the art that CGH is not limited to studying genomes of 
cancer cells or to the results of hybridizing abnormal genomes to normal genomes. CGH 
permits the comparison of nucleic acid sequence copy frequencies of any two or more 
genomes, even genomes of different species if their nucleic acid sequences are sufficiently 
complementary to allow for meaningful interpretation. It should be noted regarding 
interspecies comparisons that the information obtained by CGH includes not only an 
assessment of relative copy number but also that of sequence divergence. 

It will also be clear to those skilled in the art that hybridization with nucleic 
acid other than chromosomal DNA, such as messenger RNA (MRNA) or complementary 
DNA (cDNA) of subject cells can be used to determine the location and level of 
expression of genes in those cells. Conventional methodology is used to extract MRNA 
from a cell or cell population, and to synthesize in vitro C-DNA by reverse transcription. 

CGH does not require the preparation of condensed chromosomes, for example, 
metaphase, prophase or other condensed chromosomal states, of the subject genomes. 
Thus, genomes from which metaphase, prophase or otherwise condensed chromosomal 
spreads are difficult, time-consuming or not possible to prepare at least in good quality, 
for example, genomes of tumor cells or fetal cells can be studied by CGH. 

In CGH, labeled subject nucleic acids, for example, labeled tumor DNA, is 
hybridized to a reference genome, for example, a normal human metaphase spread, under 
conditions in which the signal from amplified, duplicated and/or deleted nucleic acid 
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sequences from the labeled nucleic acid can be visualized with good contrast. Such 
visualization is accomplished by suppressing the hybridization of repetitive sequences that 
bind to multiple loci including the high copy interspersed and clustered repetitive 
sequences, such as, Alu, Kpn, Lines, alphasatellites among others, using unlabeled total 
human genomic nucleic acid, preferably DNA, and/or the repeat-enriched (CoM) fraction 
of genomic DNA, and/or by removing such repetitive sequences from the hybridization 
mixture. In providing the detection sensitivity required, the extent of suppression of the 
; hybridization of repetitive sequences and/or removal thereof can be adjusted to the extent 
necessary to provide adequate contrast to detect the differences in copy number being 
sought; for example, subtler copy number changes may require the suppression or 
removal of lower level repetitive sequences. 

When combining more than one labeled nucleic acid in a hybridization mixture, 
the relative concentrations and/or labeling densities may be adjusted for various purposes. 
For example, when using visual observation or photography of the results, the individual 
color intensities need to be adjusted for optimum observability of changes in their relative 
intensities. Adjustments can also be made by selecting appropriate detection reagents 
(avidin, antibodies and the like), or by the design of the microscope filters among other 
parameters. When using quantitative image analysis, mathematical normalization can be 
used to compensate for general differences in the staining intensities of different colors. 

The kinetics of the CGH hybridizations are complicated. Since the subject 
nucleic acids are frequently double stranded, complementary sequences will reassociate in 
the hybridization mix as well as hybridizing to the target. Such reassociation may result 
in a more rapid decrease in concentration of the high copy sequences than the low copy 
ones, thereby making the signal intensity variations on the reference chromosomes less 
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pronounced than the copy differences in the original subject DNAS. In addition, non- 
specific binding of the labeled subject DNAs to the slide, coverslip, etc, may generally 
reduce the concentration of that labeled subject nucleic acid during the hybridization. 
Those skilled in the art will recognize numerous methods of optimizing the quantitative 
aspects of CGH, such as, mathematical correction of digital images, supplying freshly 
denatured subject DNA during the hybridization, and adding unlabeled genomic DNA in 
excess to dominate the reassociation rates. 

The resolution of CGH is presently at a level that can be seen through a light 
microscope, as is traditional cytogenetic staining. Thus, if a small sequence in a subject 
nucleic acid is amplified, to be seen as a signal in a subject genome, it must be amplified 
enough times for its signal to be able to be visualized under a light microscope. For 
example, the locus for erbB-2 which is relatively small (very approximately, a few 
hundred kb), needs to be amplified at least greater than five times to be visually 
distinguishable under a light microscope when the CGH embodiment used in Example 1 is 
employed. On the other hand, if a large section of a chromosome is present at increased 
frequency in a subject nucleic acid, the signal from that region would show up in the 
reference genome at a much lower level of amplification. 

The term "labeled" is herein used to indicate that there is some method to 
visualize nucleic acid fragments that are bound to the target, whether or not the fragments 
directly cany some modified constituent. A section infra entitled "Labeling the Nucleic 
Acid Fragments of the Subject Nucleic Acids" describes various means of directly 
labeling the probe and other labeling means by which the bound probe can be detected. 

The phrase "antenna cell line" is herein used to indicate a reference genome 
that has one or more known significant genetic aberrations, for example, a cell line 
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known to have an oncogene that is highly amplified, for example, in large homogeneously 
staining regions (HSRs). The amplified regions of that cell line would thus provide a 
much bigger target site than a normal chromosome spread. Thus, observation of the 
signal from such a large target site would be easier in that on average the signal would be 
brighter from amplified target sequences in the reference genome as provided by such an 
antenna cell line. A subject nucleic acid extracted from, for example, a number of tumor 
cells, could be tested by a CGH hybridization to such an antenna cell line to see if it also 
contained amplifications of the oncogene known to be amplified in the cell line. 

When an antenna cell line is used as the reference genome, there are instances 
wherein it can be used in interphase rather than as a chromosome spread. For example, 
if one is checking to see if a certain oncogene is amplified or not in the subject nucleic 
acid, interphase CGH is sufficient. However, the maximum amount of information is 
provided when condensed chromosome spreads are used. 

A base sequence at any point in the genome can be classified as either "single- 
copy" or "repetitive". For practical purposes the sequence needs to be long enough so 
that a complementary probe sequence can form a stable hybrid with the target sequence 
under the hybridization conditions being used. Such a length is typically in the range of 
several tens to hundreds of nucleotides. 

A "single-copy sequence" is that wherein only one copy of the target nucleic 
acid sequence is present in the haploid genoie. "Single-copy sequences" are also known 
in the art as "unique sequences". A probe complementary to a single-copy sequence has 
one binding site in haploid genome. A "repetitive sequence" is that wherein there is more 
than one copy of the same target nucleic acid sequence in the genome. Each copy of a 
repetitive sequence need not be identical to all the others. The important feature is that 
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the sequence be sufficiently similar to the other members of the family of repetitive 
sequences such that under the hybridization conditions being used, the same fragment of 
probe nucleic acid is capable of forming stable hybrids with each copy. 

Herein, the terms repetitive sequences, repeated sequences and repeats are used 
interchangeably. 

The phrase "metaphase chromosomes" in herein defined to encompass the 
concept of "condensed chromosomes" and is defined to mean not only chromosomes 
condensed in the prophase or metaphase stage of mitosis but any condensed 
chromosomes, for example, those condensed by premature chromosome condensation or 
at any stage in the cell cycle wherein the chromosome can be visualized as an individual 
entity. It is preferred that the chromosomes in the reference genome be as long as 
possible but condensed sufficiently to be visualized individually. 

A subject nucleic acid is herein considered to be the same as another nucleic 
acid if it is from a member of the same sex of the same species and has no significant 
cytogenetic differences from the other nucleic acid. For example, the DNA extracted 
from normal lymphocytes of a human female is considered for the purposes of this 
invention to be the same nucleic acid as that of DNA from normal cells of a human 
female placenta. 

The following abbreviations are used herein: 



Abbreviations 

AAF - N-acetoxy-N-2-acetyl-aminofluorene 

ATCC - American Type Culture Collection 

BN - bicarbonate buffer with NP-40 
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BRL 


Bethesda Research Laboratories 




bp 


base pair 




CCD 


charge coupled device 




CGH 


Comparative Genomic Hybridization 


5 


Chr. 


chromosomal 




CML 


chronic myelogenous leukemia 




CRRC 


Copy Ratio Reverse Cytogenetics 




DAPI 


4, 6-diamidino-2-phenylindole 




dATP 


deoxyadenosine triphosphate 



10^ 



15s 



20 



25 



30 



35 



DCS - as in fluorescein-avidin DCS (a commercially available cell sorter grade of 
fluorescein Avidin D) 

DCTP - deoxycytosine triphosphate 

DGTP - deoxyguanosine triphosphate 

DI - DNA index 

DM - double minute chromosome 

DNTP - deoxynucleotide triphosphate 

DTTP - deoxythymidine triphosphate 

DUTP - deoxyuridine triphosphate 

EDTA - ethylenediaminetetraacetate 

E/P - estrogen/progesterone 

FISH - fluorescence in situ hybridization 

FACS - fluorescence-activated cell sorting 

FITC - fluorescein isothiocyanate 

HPLC - high performance liquid chromatography 
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30 



35 



40 
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HSR - homogeneously staining region 

ISCN - International System for Cytogenetic Nomenclature 

IB - isolation buffer 

kb - Idlobase 

kDa - kilodalton 

LOH - loss of heterozygosity 

Mb - megabase 

met. - metastasis 

min - minute 

ml - milliliter 

mm - millimole 

mm - millimeter 

ng - nanogram 

NIGMS - National Institute of General Medical Sciences 

NP-40 - non-ionic detergent commercially available from Sigma as Nonidet P-40 
(St. Louis, MO) 

PBS - phosphate-buffered saline 

PCR - polymerase chain reaction 

PHA - phytohemagglutinin 

Pi - propidium iodide 

pi. - pleural 

PMSF - phenylmethylsulfonyl fluoride 

PN - mixture of 0.1 M NaH2po4 and 0.1 M buffer Na2HP04, pH 8; 0.1% NP- 
40 

PNM - Pn buffer plus 5% nonfat dry milk buffer (centrifuged); 0.02% Na azide 
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QUIRK - 


quantitative in situ ratio karyotyping 


Rb-i 


retinoblastoma tumor suppressor gene 


RFLP 


restriction fragment length polymorphism 


RPM 


revolutions per minute 


SD 


Standard Deviation 


SDS 


sodium dodecyl sulfate 


ssc 


0.15 M NaCl/0.015 M Na citrate, pH 7 


Td 


doubling time 


ug 


microgram 


ul 


microliter 


um 


micrometer 


um 


micromole 


VNTR 


variable number tandem repeat 



Resolution of differences in copy number can be improved by the use of image 
analysis and by averaging the results from hybridizations of a subject nucleic acid to 
multiple condensed chromosome spreads. Using such methods, the background signal 
(noise) can be differentiated from actual nucleic acid sequence copy number differences.. 

Image Analysis: 

An image analysis system, preferably computer assisted, can be used to 
enhance and/or accurately quantitate the intensity differences between and/or among the 
signals from a hybridization and the background staining differences for more accurate 
and easier interpretation of results. Image analysis and methods to measure intensity are 
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described, for example, in Hiraoka et al., Science, 238: 36-41 (1987) and Aikens et al., 
Meth. Ceil Biol. . 29: 291313 (1989). In such an image analysis system, it is preferred 
to use a high quality CCD camera whose intensity response is known to be linear over a 
wide range of intensities. 

The components of a particular quantitative image processing system (QUIPS) 
are described in Example 1 under the subheading Fluorescence Microscopy and 
Interpretation of Results . As exemplified in Example 1, a computer-assisted image 
analysis system with a filterwheel is used so that the images from the signals and 
counterstaining of the DNA are superimposed on one image. Pseudocolors, that is, 
colors that are not exactly spectrally converted, can be displayed. Contrast stretching, 
wherein the differences between the intensity levels of the signals and background staining 
differences are enhanced by adjusting controls of the image analysis system. 
Thresholding can also be used wherein the background staining can be assigned a value 
close to zero so it would barely appear in the processed image from such a system. 
Similarly, computer analysis permits substraction of background, smoothing of 
fluctuations in the signals, accurate intensity and ratio calculations and the ability to 
average signals on chromosomes in multiple spreads. 

Absolute Copy Nvmten; 

Hybridization of the subject DNAs to the reference chromosomes gives 
information on relative copy numbers of sequences. Some additional normalization is 
required to obtain absolute copy number information, one convenient method to do this is 
to hybridize a probe, for example a cosmid specific to some single locus in the normal 
haploid genome, to the interphase nuclei of the subject cell or cell populations) (or those 
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of an equivalent cell or representative cells therefrom, respectively). Counting the 
hybridization signals in a representative population of such nuclei gives the absolute 
sequence copy number at that location. Given that information at one locus, the intensity 
(ratio) information from the hybridization of the subject DNA(S) to the reference 
condensed chromosomes gives the absolute copy number over the rest of the genome. In 
practice, use of more than one reference locus may be desirable. In this case, the best fit 
of the intensity (ratio) data through the reference loci would give a more accurate 
determination of absolute sequence copy number over the rest of the genome. 

Thus, the CGH methods of this invention combined with other well-known 
methods in the art can provide information on the absolute copy numbers of substantially 
all RNA or DNA sequences in subject cell(s) or cell populations) as a function of the 
location of those sequences in a reference genome. For example, one or more 
chromosome-specific repeat sequence or high complexity painting probes can be 
hybridized independently to the interphase nuclei of cells representative of the genomic . 
constitution of the subject cell(s) or cell populations). Whole chromosome painting 
probes are now available for all the human chromosomes [Collins et al., Genomics. 11: 
9971006 (1991)]. Specific repeat-sequence probes are also available [Trask et al., Hum f 
Genet.. 78: 251 (1988) and references cited therein; and commercially available from 
Oncor (Gaithersburg, MD, USA)]. Hybridization with one or more of such probes 
indicates the absolute copy numbers of the sequences to which the probes bind. 

For such interphase analysis, painting probes with a complexity of from about 
35 kb to about 200 kb, are preferred; probes from about 35 kb to about 100 kb are 
further preferred; and still more preferred are probes having a complexity of from about 
35 kb to 40 kb, for example, a cosmid probe. Exemplary of such locus-specific painting 
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probes Are any cosmid, yeast artificial chromosomes (YACs), bacterial artificial 
chromosomes (BACs), and/or pi phage probes as appropriate, preferably to the arms of a 
selected chromosome. Such cosmid probes, for example, are commercially available 
from Clontech [South San Francisco, CA (USA)] which supplies cosmid libraries for all 
the human chromosomes. Another example of a cosmid probe that could be used in such 
methods of this invention would be a 3p cosmid probe called cCI3-787 obtained from 
Yusuke Nakamura, M.D., Ph.D. [Division of Biochemistry, Cancer Institute, Toshima, 
Tokyo, 170, Japan], Its isolation and mapping to 3p2L2-p21.1 is described in 
Yamakawa et ah, Genomics . 1(3): 536-543 (1991). Another example would be a 3q 
cosmid probe named J14R1A12 obtained from Wen-Lin Kuo [Biomedical Department, 
P.O. Box 5507 (L-452), Lawrence Livermore National Laboratory Livermore, CA 94550 
(USA)], For interphase analysis, preferred repeat sequence probes are centromeric- 
specific and/or peri-centromeric-specific repeat sequence probes. Such a centromeric- 
probe is, for example, the chromosome 17 peri-centromeric repeat probe (cosmid cki7.10) 
and the alpha satellite repeat probe for the centromeric region of chromosome 8, both of 
which are described in Example 1 infra. A variety of repeat sequence probes are 
commercially available from oncor [Gaithersburg, MD (USA)]. However, the locus- 
specific painting probes are preferred over the repeat sequence probes for the methods of 
this invention to determine absolute copy numbers of nucleic acid sequences. 

Further, when the subject nucleic acid sequences are DNA, the reference copy 
numbers can be determined by Southern analysis. When the subject nucleic acid 
sequences are RNA, the reference copy numbers can be determined by Northern analysis. 

Those reference copy numbers or reference frequencies provide a standard by 
which substantially all the RNA or DNA sequences in the subject cell(s) or cell 
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populations) can be determined. CGH methods are used to determine the relative copy 
numbers of the rest of the sequences. However, absolute copy numbers require a 
standard against which the results of CGH can be determined. Otherwise the CGH 
procedures would have to be highly standardized and quantitated to see differences in the 
absolute copy numbers of sequences in a genome, for example, haploidy, triploidy, 
octaploidy, wherein there are 1, 3 and 8 copies of each of the chromosomes, respectively. 

PCR and Microdisgection: 

The mechanics of PCR are explained in Saiki et al., Science. 230: 1350 (1985) 
and U.S. Patent Nos. 4,683,195, 4,683,202 (both issued July 18, 1987) and 4,800,159 
(issued January 24, 1989).] PCR offers a rapid, sensitive and versatile cell-free molecular 
cloning system in which only minute amounts of starting material are required. 

A preferred PCR method to amplify the subject nucleic acids for testing by 
CGH is a PCR adapter-linker amplification (Saunders et al., Nuc. Acids Res.. 17 9027 
(1990); Johnson, Genomics 6: 243 (1990) and PCT 90/00434 (published August 9, 
1990).] The labeled subject nucleic acid could be produced by such a adapter-linker PCR 
method from a few hundred cells; for example, wherein the subject nucleic acid is tumor 
DNA, the source DNA could be a few hundred tumor cells. Such a method could 
provide a means to analyze by CGH clonal sub-populations in a tumor. 

Another preferred PCR method is a method employing a mixture of primers 
described in Meltzer et al., "Rapid Generation of Region Specific Probes by Chromosome 
Microdissection and their Application: A Novel Approach to Identify Cryptic 
Chromosomal Rearrangements," Nflfurft-Genetics. 1(11: 24-28 (April 1992). 
Microdissection of sites in the reference metaphase spread that produce signals of interest 
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in CGH, would permit PCR amplification of nucleic acid sequences bound at such sites. 
The amplified nucleic acid could then be easily recovered and used to probe available 
libraries, as for example, cosmid libraries, so that the amplified sequences could be more 
rapidly identified. 

High copy repetitive sequences can be suppressed in amplifying the subject 
nucleic acid by PGR, The PCR primers used for such a procedure are complementary to 
the ends of the repetitive sequences. Thus, upon proper orientation, amplification of the 
sequences flanked by the repeats occurs, one can further suppress production of repetitive 
sequences in such a PCR procedure by first hybridizing complementary sequences to said 
repetitive sequences wherein said complementary sequences have extended non- 
complementary flanking ends or are terminated in nucleotides which do not permit 
extension by the polymerase. The non-complementary ends of the blocking sequences 
prevent the blocking sequences from acting as a PCR primer during the PCR process. 
Primers directed against the Alu and Li repetitive DNA families have allowed the 
selective amplification of human sequences by interspersed repetitive sequence PCR (IRS- 
PCR) [Nelson et al., PNAS. 86: 6686 (1989); Ledbetter et al. t Genomics. 6: 475 (1990)]. 

Archived Material 

An important aspect of this invention is that nucleic acids from archived tissue 
specimens, for example, paraffin-embedded or formalin-fixed pathology specimens, can 
be tested by the methods of CGH. Said nucleic acid cannot, of course, be prepared into 
chromosome spreads for traditional cytogenetic chemical staining. Also, it is difficult for 
large enough restriction fragments to be extracted from such material for other 
conventional research tools, such as Southern analysis. However, the nucleic acid from 
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such specimens can be extracted by known techniques such as those described in Greer et 
ah, Anatomic Pathology . 25(2): 117-124 (1991) and Dubeau et aL, Cancer Res.. 46: 
2964-2969 (1986), and if necessary, amplified for testing by various CGH methods. Such 
nucleic acid can be amplified by using a polymerase chain reaction (PCR) procedure 
(described above), for example, by the method described in Greer et aL, supra wherein 
DNA from paraffin-embedded tissues is amplified by PCR. 

A particular value of testing such archived nucleic acid is that such specimens 
are usually keyed to the medical records of the patients from whom the specimens were 
taken. Therefore, valuable diagnostic/prognostic associations can be made between the 
revealed cytogenetic state of patients' nucleic acid material and the medical histories of 
treatment and outcome for those patients. For example, information gathered by CGH 
can be used to predict the invasiveness of a tumor based upon its amplification and/or 
deletion pattern matched to associations made with similar patterns of patients whose 
outcomes are known. 

Analogously, other nucleic acid that is fixed by some method, as, for example, 
archeological material preserved through natural fixation processes, can also be studied by 
CGH procedures. As indicated above, copy number differences between species provide 
information on the degree of similarity and divergence of the species studied. 
Evolutionarily important linkages and disjunctions between and among species, extant or 
extinct, can be made by using the methods of CGH. 

Tumor cyt ogenetics 

CGH provides the means to assess the association between gene amplification 
and/or deletion and the extent of tumor evolution. Correlation between amplification 
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and/or deletion and stage or grade of a cancer may be prognostically important because 
such information may contribute to the definition of a genetically based tumor grade that 
would better predict the future course of disease with more advanced tumors having the 
worst prognosis. In addition, information about early amplification and/or deletion events 
may be useful in associating those events as predictors of subsequent disease progression. 
Gene amplification and deletions as defined by CGH to, for example, normal metaphase 
spreads (genomic site, intensity of the signal and/or differences in signal ratios, and 
number of different genomic sites at which the copy number differences occur) can be 
associated with other known parameters such as tumor grade, histology, Brd/Urd labeling 
index, hormonal status, nodal involvement, tumor size, survival duration and other tumor 
properties available from epidemiological and biostatistical studies. For example, tumor 
DNA to be tested by CGH could include atypical hyperplasia, ductal carcinoma in situ, 
stage I-m cancer and metastatic lymph nodes in order to permit the identification of 
associations between amplifications and deletions and stage. 

The associations made may make possible effective therapeutic intervention. 
For example, consistently amplified regions may contain an overexpressed gene, the 
product of which may be able to be attacked therapeutically (for example, the growth 
factor receptor tyrosine kinase, plSS* 2 **). 

CGH hybridizations of nucleic acids from cells of primary cancers that have 
metastasized to other sites can be used to identify amplification and/or deletion events that 
are associated with drug resistance. For example, the subject nucleic acids to be analysed 
could be selected so that approximately half are from patients whose metastatic disease 
responded to chemotherapy and half from patients whose tumors did not respond. If gene 
amplification and/or deletion is,a manifestation of karyotypic instability that allows rapid 
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development of drug resistance, more amplification and/or deletion in primary tumors 
from chemoresistant patients than in tumors in chemosensitive patients would be expected. 
For example, if amplification of specific genes is responsible for the development of drug 
resistance, regions surrounding those genes would be expected to be amplified 
consistently in tumor cells from pleural effusions of chemoresistant patients but not in the 
primary tumors. Discovery of associations between gene amplification and/or deletion 
and the development of drug resistance may allow the identification of patients that will 
or will not benefit from adjuvant therapy. 

Once a new region of amplification or deletion has been discovered by CGH, it 
can be studied in more detail using chromosome-specific painting [Pinkel et al., PNAS 
fUSA^. 85: 9138-9142 (1988); EP Publication No. 430,402 (June 5, 1991)) with a 
collection of probes that span the amplified or deleted region. Probes to amplified 
regions will show more signals than centromeric signals from the same chromosome, 
whereas probes to nonamplified regions will show approximately the same number of test 
and centromeric signals. For example, the amplified regions on 17q22-23 and 20qter 
(discussed as newly discovered regions of amplification in Example 1) show variability in 
size from tumor to tumor using CGH (the 17q22-23 region more markedly); it can be 
expected that the region containing the important gene(s) can be narrowed by mapping the 
regions of amplification in multiple tumors in more detail to find the portion that is 
amplified in all cases. Probes for those studies can be selected, for example from specific 
cosmid libraries produced by the National Laboratory Gene Library Project and/or from 
the National Institute of Health (NIH) genomic research projects. 

The c-erbB-2 oncogene, also referred to as HER-2 or neu, encodes for a 185 
kilodalton (Kd) protein. Studies have reported c-erbB-2 gene amplification in human 
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mammary tumor ceil lines. [Kraus etaL, EMBO J. 6: 605-610 (1987); van de Vijver et 
aL, Mol. Cell Biol., 7: 2019-2023 (1987).] Also, c-erbB-2 gene amplification in human 
breast cancer has been shown to be associated with disease behavior, and may be a 
predictor of clinical outcome. [Slamon et aL, Science. 235: 177-182 (1987); Berger et 
aL, Cancer Res.. 48: 1238-1243 (1988); Zhou et aL, Cancer Res.. 47: 6123-6125 
(1987); and Venter et aL, Lancet. 11: 69-71 (1987)]. C-erbB-2 has also been shown to 
be amplified in ovarian cancers. [Alitalo and Schwab, Advances in Cancer Res. . 47: 
235-281 (1986).] 

C-mvc is a proto-oncogene which is the cellular homolog of the transforming gene of 
the chicken retrovirus MC29. In humans, c-myc lies on the long arm of chromosome 8, 
at band 124, and spans about 5 kilobase pairs. The myc protein is a phosphoprotein 
present in the nucleus. The normal function of c-myc is unknown; however, it also 
certainly plays a role in cell division, and is expressed in normally growing cells as well 
as in tumor cells. It is now widely believed that translocations involving c-myc lead to 
altered transcription of the gene, contributing to malignant transformation. 

Sequences from N-myc member of the myc gene family have been shown to be 
amplified as much as a thousandfold in some neuroblastomas. N-myc amplifications are 
usually seen in the later stage in and IV tumors. Some small-cell lung carcinomas also 
have amplified my$ genes in double minute chromosomes (DMs) and homogeneously 
staining regions (HSRs). Myc has also been shown to be amplified in colon cancer. 
[Alitalo and Schwab, supra .] Again such amplifications are found in late stages of tumor 
development, in the so-called variant cells that exhibit a more malignant behavior. 
Amplifications can involve either c-myc. N-myc or another member of the myc gene 
family, L-myc . [Watson et aL, supra at pp. 1084-1086]. 
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In addition, overexpression has been observed for the p-glycoprotein gene 
family associated with multi-drug resistance and for drug metabolizing enzymes such as 
P450 containing enzymes and glutathione S-transferase. [Fairchild and Cowan, L 
Radiation Oncol Biol Phvs„ 20 : 361-367 (1990).] 

Identification of amplified and/or deleted genes is important to the management 
of cancer, for example, breast cancer, for several reasons: 

1) to improve prognostication; 

2) to detect amplification and/or deletion events that are associated with the 
development of drug resistance; and 

3) to improve therapy. 

For example, in regard to improving prognostication, in breast cancer the amplification of 
oncogenes, such as int-2, erbB-2 and myc occur frequently and have been associated with 
aggressive growth and poor prognosis in some studies. [Schwab and Amier, Genes. 
Chromosomes & Cancer. 1 : 181-193 (1990).] In regard to reason (2), gene 
amplification has clearly been shown to lead to drug resistance in vitro (for example, 
amplification of the dihydrofolate reductase gene confers resistance to methotrexate), and 
is likely to occur in patients undergoing therapy as well (for example, as a result of over 
expression of glutathione S-transferase and p-glycoprotein). [Fairchild and Cowan, supral 
Thus, the identification of resistance-linked genes would have a major impact on therapy 
by allowing therapy modification as resistance-related gene amplification occurs. Therapy 
could be improved by targeting for specific therapy, tumors that overexpress specific 
amplified genes* 
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Prenatal Diagnosis 

Prenatal screening for disease-linked chromosome aberrations (e.g., trisomy 21) 
is enhanced by the rapid detection of such aberrations by the methods and compositions of 
this invention. CGH analysis is particularly significant for prenatal diagnosis in that it 
yields more rapid results than are available by cell culture methods. 

Removal of Repetitive, Sequences and/or Disabling the Hybridization Capacity of 
Repetitive Sequences 

The following methods can be used to remove repetitive sequences and/or 
disable the hybridization capacity of such repetitive sequences. Such methods are 
representative and are expressed schematically in terms of procedures well known to those 
of ordinary skill the art, and which can be modified and extended according to parameters 
and procedures well known to those in the art. 

Bulk Procedures . In many genomes, such as the human genome, a major 
portion of distributed (or shared) repetitive DNA is contained in a few families of highly 
repeated sequences such as Alu. These methods primarily exploit the fact that the 
hybridization rate of complementary nucleic acid strands increases as their concentration 
increases. Thus, if a mixture of nucleic acid fragments is denatured and incubated under 
conditions that permit hybridization, the sequences present at high concentration will 
become double-stranded more rapidly than the others. The double-stranded nucleic acid 
can then be removed and the remainder used in the hybridizations. Alternatively, the 
partially hybridized mixture can be used as the subject nucleic acid, the double-stranded 
sequences being unable to bind to the target. The following are methods representative of 
bulk procedures that are useful for disabling the hybridization capacity of repetitive 
sequences or removing those sequences from a mixture. 
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Self-reassociation . Double-stranded nucleic acid in the hybridization mixture is 
denatured and then incubated under hybridization conditions for a time sufficient for the 
high-copy sequences in the mixture to become substantially double-stranded. The 
hybridization mixture is then applied to the reference chromosome spread. The remaining 
labeled single-stranded copies of the highly repeated sequences may bind throughout the 
reference chromosome spread producing a weak, widely distributed signal. 

Use of blocking nucleic acid. Unlabeled nucleic acid sequences which are 
complementary to those sequences in the hybridization mixture whose hybridization 
capacity it is desired to inhibit are added to the hybridization mixture. The subject 
nucleic acids and blocking nucleic acid are denatured, if necessary, and incubated under 
appropriate hybridization conditions. The sequences to be blocked become double- 
stranded more rapidly than the others, and therefore are unable to bind to the reference 
spread when the hybridization mixture is applied to the spread. In some cases, the 
blocking reaction occurs so quickly that the incubation period can be very short, and 
adequate results can be obtained if the hybridization mix is applied to the spread 
immediately after denaturation. Further, the probe and the target can be simultaneously 
denatured in some cases. A blocking method is generally described in the context of 
Southern analysis by Sealy et al., "Removal of Repeat Sequences form Hybridization 
Probes", Nucleic Acid Research . 12:1905 (1985). Examples of blocking nucleic acids 
include genomic DNA, a high-copy fraction of genomic DNA and particular sequences as 
outlined below. 

i. Genomic DNA. Genomic DNA contains all of the nucleic acid sequences 
of the organism in proportion to their copy-number in the genome. Thus, adding 
genomic DNA to the hybridization mixture increases the concentration of the high-copy 
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repeat sequences more than low-copy sequences, and therefore is more effective at 
blocking the former, 

ii. High-copy fraction of genomic DNA . Fractionating the genomic DNA to 
obtain only the high-copy sequences and using them for blocking can be done, for 
example, with hydroxyapatite as described below. 

Removal of Sequences , 

Hydroxyapatite . Single- and double-stranded nucleic acids have different 
binding characteristics to hydroxyapatite. Such characteristics provide a basis commonly 
used for fractionating nucleic acids. Hydroxyapatite is commercially available [e.g., Bio- 
Rad Laboratories, Richmond, CA (USA)]. The fraction of genomic DNA containing 
sequences with a particular degree of repetition, from the highest copy-number to single- 
copy, can be obtained by denaturing genomic DNA, allowing it to reassociate under 
appropriate conditions to a particular value of C 0 t, followed by separation using 
hydroxyapatite. The single- and double-stranded nucleic acid can also be discriminated 
by use of SI nuclease. Such techniques and the concept of C 0 t are explained in Britten et 
al., "Analysis of Repeating DNA Sequences by Reassociation", in Methods in 
EwWQlWV, 29: 363^418 (1974). 

Reaction with immobilized nucleic acid . Removal of particular sequences can 
also be accomplished by attaching single-stranded "absorbing" nucleic acid sequences to a 
solid support. Single-stranded source nucleic acid is hybridized to the immobilized 
nucleic acid. After the hybridization, the unbound sequences are collected and used in 
CGH. For example, human genomic DNA can be used to absorb repetitive sequences 
from the subject nucleic acids* one such method is described by Brison et al M "General 
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Method for Cloning Amplified DNA by Differential Screening with Genomic Probes/ 
Molecular and Cellular Biology, 2 : 578-587 (1982). Briefly, minimally sheared human 
genomic DNA is bound to diazonium cellulose or a like support. The source DNA, 
appropriately cut into fragments, is hybridized against the immobilized DNA to Cot 
values in the range of about 1 to 100. The preferred stringency of the hybridization 
conditions may vary depending on the base composition of the DNA. 

Prehybridization . Blocking of repeat sequence binding sites in the reference 
genome by hybridization with unlabeled complementary sequences will prevent binding of 
labeled sequences in the subject nucleic acids that have the potential to bind to those sites. 
For example, hybridization with unlabeled genomic DNA will render the high-copy 
repetitive sequences in the reference genome double-stranded. Labeled copies of such 
sequences in the subject nucleic acids will not be able to bind when they are subsequently 
applied. 

In practice, several mechanisms can be combined to produce the desired 
contrast and sensitivity. 

Labelina the Nucleic Acid Fragments of the Subject Nucl eic Acids 

There are many techniques available for labeling single- and double-stranded 
nucleic acid fragments of the subject nucleic acids. They include incorporation of 
radioactive labels, e.g. Harper et aL Chromosome. 83 : 431-439 (1984); direct 
attachment of fluorochromes or enzymes, e.g. Smith et aL, Nuc. Acids Res.. 13: 2399- 
2412 (1985), and Connolly et aL, Nuc. Acids Res.. 13 : 4485-4502 (1985); and various 
chemical modifications of the nucleic acid fragments that render them detectable 
immunochemical^ or by other affinity reactions, e.g. Tchen et aL, "Chemically Modified 
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Nucleic Acids as Immunodetectable Probes in Hybridization Experiments," PNAS. 81 : 
3466-3470 (1984); Richardson et ah, "Biotin and Fluorescent Labeling of RNA Using T4 
RNA Ligase," Nuc. Acids Res.. 11 : 6167-6184 (1983); Langer et ah, "Enzymatic 
Synthesis of Biotin-Labeled Polynucleotides: Novel Nucleic Acid Affinity Probes/ 
PNAS. 78 : 6633-6637 (1981); Brigati et aL, "Detection of Viral Genomes in Cultured 
Cells and Paraffin-Embedded Tissue Sections Using Biotin-Labeled Hybridization 
Probes," Virol.. 126 : 32-50 (1983); Broker et al M "Electron Microscopic Visualization of 
TRNA Genes with Ferritin-Avidin: Biotin Labels," Nuc. Acids Res.. 5 : 363-384 (1978); 
Bayer et aL, "The Use of the Avidin Biotin Complex as a Tool in Molecular Biology," 
Methods of Biochem . Analysis. 26 : 1-45 (1980); Kuhlmann, Immunoenzyme 
Techniques in Cytochemistry (Weinheim, Basel, 1984). Langer-Safer et aL, PNAS 
(USA), 72: 4381 (1982): Landegent et aL, Exp. Cell Res. . 153 : 61 (1984); and 
Hopman et aL, Exp. Cell Res .. 169 : 357 (1987). Thus, as indicated, a wide variety of 
direct and/or indirect means are available to enable visualization of the subject nucleic 
sequences that have hybridized to the reference genome. Suitable visualizing means 
include various ligands, radionuclides, fluorochromes and other fluorescers, 
chemiluminescers, enzyme substrates or co-factors, particles, dyes and the like. Some 
preferred exemplary labeling means include those wherein the probe fragments are 
biotinylated, modified with N-acetoxy-N-2-acetyIaminofluorene, modified with fluorescein 
isothiocyanate or other fluorochromes, modified with mercury/TNP Iigand, sulfonated, 
digoxigeninated or contain T-T dimers, 

A preferred method of labeling is tailing by terminal transferase labeling. 
Another preferred method is random priming with mixed sequence primers followed by 
polymerase extension. This has the additional feature of amplifying the amount of subject 
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DNA, if several cycles are used, which is useful when only a small amount of DNA was 
originally obtained from the subject cell or cell population. 

The key feature of labeling is that the subject nucleic acid fragments bound to 
the reference spread be detectable. In some cases, an intrinsic feature of the subject 
nucleic acid, rather than an added feature, can be exploited for this purpose. For 
example, antibodies that specifically recognize RNA/DNA duplexes have been 
demonstrated to have the ability to recognize probes made from RNA that are bound to 
DNA targets (Rudkin and Stollar, Nature. 265:472-473 (1977)]. The RNA used is 
unmodified. Nucleic acid fragments can be extended by adding "tails" of modified 
nucleotides or particular normal nucleotides. When a normal nucleotide tail is used, a 
second hybridization with nucleic acid complementary to the tail and containing 
fiuorochromes, enzymes, radioactivity, modified bases, among other labeling means, 
allows detection of the bound nucleic acid fragments. Such a system is commercially 
available from Enzo Bicchem [Eiobridge Labeling System; Enzo Biochem Inc., New 
York, N.Y.(USA)]. 

Another example of a means to visualize the bound nucleic acid fragments 
wherein the nucleic acid sequences do not directly carry some modified constituent is the 
use of antibodies to thymidine dimers. Nakane et al., ACTA Histochem. Cvtochem., 2Q 
(2):229 (1987), illustrate such a method wherein thymine-thymine dimerized DNA (T-T 
DNA) was used as a marker for in situ hybridization. The hybridized T-T DNA was 
detected immunohistochemically using rabbit anti-T-T DNA antibody. 

All of the labeling techniques disclosed in the above references may be 
preferred under particular circumstances. Further, any labeling techniques known to 
those in the art would be useful to label the subject nucleic acids in of this invention. 
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Several factors govern the choice of labeling means, including the effect of the label on 
the rate of hybridization and binding of the nucleic acid fragments to the chromosomal 
DNA, the accessibility of the bound nucleic acid fragments to labeling moieties applied 
after initial hybridization, the mutual compatibility of the labeling moieties, the nature and 
intensity of the signal generated by the label, the expense and ease in which the label is 

applied, and the like. 

Several different subject nucleic acids, each labeled by a different method, can 
be used simultaneously. The binding of different nucleic acids can thereby be 
distinguished, for example, by different colors. 

In Situ Hybridization, 

Application of the subject nucleic acids to the reference chromosome spreads is 
accomplished by standard in siSSi hybridization techniques. Several excellent guides to the 
technique are available, e.g., Gall and Pardue, "Nucleic Acid Hybridization in 
Cytological Preparations, " Methods in Enzymologv. 21 : 470-480 (1981); Henderson, 
"Cytological Hybridization to Mammalian Chromosomes," International Review of 
Cytology. 76 : 1-46 (1982); and Angerer et al,, " in situ Hybridization to Cellular RNAS," 
in Genetic Engineering: Principles and Methods , Setlow and Hollaender, Eds,, Vol, 7, 
pgs. 43-65 (Plenum Press, New York, 1985). 

Generally in situ hybridization comprises the following major steps: (1) fixation 
of tissue or biological structure to be examined, (2) prehybridization treatment of the 
biological structure to increase accessibility of target DNA, and to reduce nonspecific 
binding, (3) hybridization of the mixture of nucleic acids to the nucleic acid in the 
biological structure or tissue; (4) posthybridization washes to remove nucleic acid 

-63- 



fragments not bound in the hybridization and (5) detection of the hybridized nucleic acid 
fragments. The reagents used in each of these steps and their conditions of use vary 
depending on the particular situation. 

Under the conditions of hybridization wherein human genomic DNA is used as 
an agent to block the hybridization capacity of the repetitive sequences, the preferred size 
range of the nucleic acid fragments is from about 200 bases to about 1000 bases, more 
preferably about 400 to 800 bases for double-stranded, nick-translated nucleic acids and 
about 200 to 600 bases for single-stranded or PCR adapter-linker amplified nucleic acids. 

Example 1 provides details of a preferred hybridization protocol. Basically the 
same hybridization protocols as used for chromosome-specific painting as described in 
Pinkel et al., PNAS (TJSAV 85: 9138-9142 (1988) and in EP Pub. No. 430,402 
(published June 5, 1991) are adapted for use in CGH. 

The following representative examples of performing CGH methods of this 
invention are for purposes of illustration only and are not meant to limit the invention in 
any way. 

Example 1 

DNA from Breast Cancer Lines Hybridized to Normal Metaphase Spreads 

In this Example, methods of this invention to analyze genomes by Comparative 
Genomic Hybridization (CGH) are exemplified by hybridizations of breast cancer cell 
lines to normal metaphase spreads. The target metaphase spreads were pre-hybridized 
with unlabeled human placental DNA to block the high copy repeat sequences. In this 
representative example, the hybridization mixture containing the extracted labeled DNA 
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"A 

from the cell, lines contained unlabeled, repeat-enriched Cot-1 blocking DNA [obtained 
from Bethesda Research, Laboratories (BRL), Gaithersburg, MD (USA]. 

The experiments outlined below include in the hybridization mixture for the 
subject genomes, that is, the breast cancer cell line DNAS, chromosome-specific repeat 
sequence probes and chromosome-specific painting probes. Those probes labeled with 
biotin were included as an adjunct for identifying chromosomes in the metaphase 
preparations. The experiments were first performed without those chromosome-specific 
probes. Then each chromosome of interest was measured to determine its length which 
was considered along with other factors to determine its probable identity. The 
chromosome-specific probes were then used in the hybridization mixture to confirm the 
identity of the chromosome of interest. However, such probes are not necessary as the 
chromosomes could have been identified by the DAPI banding of the counterstain or by 
other chemical staining, such as staining with quinacrine, by a skilled cytogeneticist. 

Cell Lings and r e l ation of DNA : 

Six established breast cancer cell lines: BT-474, SK-BR-3, MCF-7, MDA-MB- 
361, MDA-MB-468 and T-47D were obtained from the American Type Culture 
Collection (Rockville, Maryland (USA)]. The breast cancer cell line 60OMPE cell line 
was kindly provided by Dr. Helene S. Smith [Geraldine Brush Cancer Research Center, 
San Francisco, CA (USA)]. Cell lines were grown until they became confluent. Cells 
were then trypsinized, pelleted by centrifugation at 1500 RPM for 5 minutes and washed 
twice in phosphate buffered saline. The DNA was then isolated as described by 
Sambrook et al., Molecular Cloning: A Laboratory Manual . Vol. 2: 9.16-9.19 [Cold 
Spring Harbor Laboratory Press, Cold Spring Harbor, NY (USA) 1989]. 
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Details concerning the established human breast cancer cell lines used herein 
are as follows: 



BT-474 originated from a human primary cancer; obtained from the ATCC, 
catalog # HTB 20; 

5 

SK-BR-3 Originated from a human metastatic breast adenocarcinoma derived 
from a pleural effusion; obtained from the ATCC catalog # HTB 30; 

MDA-MB-361 originated as a metastatic tumor to the brain; obtained from the 

10 ATCC, catalog # HTB 27; 

% MCF-7 originated from a human metastatic pleural effusion; obtained from 

"J5 the ATCC, catalog HTB 22; 

15 fij T-47D Originated as a human metastatic pleural effusion; obtained from the 

ffi ATCC catalog HTB 133; 

tt j 60OMPE originated as a human metastatic pleural effusion; kindly provided by 

!_ Dr. Helene S. Smith [Geraldine Brush Cancer Research Center, San 

20 ir- Francisco, CA (USA)]; and 



MDA-MB-468 originated as a-metastatic pleural effusion; obtained from the ATCC, 
catalog # HTB 132. 



25 Preparation of Normal Lymphocyte Metaphases: 

Normal peripheral blood lymphocytes were stimulated by PHA, synchronized 
by methotrexate; treatment and blocked in metaphase using 0.05 ug/ml coicemid. Cells 
were then centrifuged, washed and incubated in 75 mM KC1 at 370C for 15 minutes. 
Cells were then, fixed in methanol:acetic acid (3:1) and dropped onto slides. The slides 

30 were stored under nitrogen at -200C. 



DNA Labeling: 

Cell line DNAs were labeled with digoxigenin-HDUTP using nick translation 
[Rigby et al. r J. Mol. Biol .. 113: 237 (1977); Sambrook et al., SHEmL The optimal 
35 size of the probe fragments after nick translation and before denaturing was 400-800 bps. 

- 66- 



As indicated above, chromosomespecific probes were used in dual-color hybridizations to 
verify the identification of chromosomes of interest in the metaphase spreads. 
Representative examples of such chromosome-specific reference probes labeled with 
biotin-14DATP include the following: 

1) a chromosome-specific painting probe for chromosome 20 prepared by the 
PCR adapter-linker method as described in PCT/US90/00434 published August 9, 1990; 

2) a chromosome 17 peri-centromeric repeat probe (cosmid ckl7.10) isolated 
by Anne Kallioniemi from a chromosome 17 cosmid library from Los Alamos National 
Laboratory (Albuquerque, New Mexico (USA)]; an equivalent chromosome-specific 
repeat sequence probe for chromosome 17 is commercially available from oncor 
[Gaithersburg, MD (USA)]; and 

3) an alpha satellite repeat probe specific for the centromeric region of 
chromosome 8 [kindly provided by Dr. Heinz-Ulrich G. Weier; University of California 
Medical Center, Lab for Cell Analysis, San Francisco, CA (USA)]; that probe was 
generated by Dr. Weier using PCR with primers WA1 and WA2 as described in Weier et 
aL, Hum. Genet,. 87 : 489-494 (1991). 

Ones skilled in the art recognize that there are many other equivalent probes 
available that could be used for the confirmation purposes described. For example, whole 
chromosome painting probes are now available for all the human chromosomes [Collins et 
aL, Genomics. 11 : 997-1006 (1991)]. Also available are repeat sequence probes that 
hybridize intensely and specifically to selected chromosomes (Trask et aL, Hum. Genet.. 
2£: 251 (1988) and references cited therein]. 
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Pretreatment and Prehybrid iration of Slides: 

Lymphocyte metaphase preparations were first denatured in 70% 
formamide/2XSSC (LXSSC is 0.15 M NaCl, 0.015 M NaCitrate), pH 7, at 700C for 2 
minutes and dehydrated in a sequence of 70%, 85% and 100% ethanol. The slides were 
5 then air dried and treated with 10 ug/50 ml Proteinase K [Boehringer Mannheim GmbH, 
Indianapolis IN (USA)] for 7.5 minutes at 370C in a buffer containing 20 mM Tris and 2 
q mM CaC12 (pH 7.5). Ethanol dehydration was then done as described above, and the 
« slides were prehybridized with ten ul of a hybridization mixture, consisting of 20 ug 
Jj; unlabeled human placental DNA [obtained from sigma, St. Louis, MO (USA); size of the 
10 m fragments is 200-700 bps] in 50% formamide, 10% dextran sulphate and 2XSSC (pH 7) 
D for 60 minutes at 370C. Before the prehybridization mixture was applied to the slides, it 
W was denatured in a 700C water bath for 5 minutes. After prehybridization, the slides 
R 1 were washed once in 2XSSC and dehydrated with ethanol as described above. 

15 Hybridization: 

Five ug of unlabeled, repeat-enriched Cot-1 blocking DNA [BRL, 
Gaithersburg, MD (USA)] and 60 ng of digoxigenin labeled cell line DNA and 20-60 ng 
of biotinlabeled reference probes (for verification of chromosome identification) were 
mixed together and 1/10 vol of 3M Naacetate was added. DNA was precipitated by 

20 adding 2 volumes of 100% ethanol followed by centrifugation in a microcentrifuge for 30 
minutes at 15,000 RPM. Ethanol was removed and the tubes were allowed to dry until 
all visible ethanol had evaporated. Ten ul of hybridization buffer consisting of 50% 
formamide, 10% dextran sulphate and 2XSSC (pH 7) was then added, followed by careful 
mixing. DNAs in the hybridization buffer were then denatured for 5 minutes at 700C 
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followed by a 60 minute renaturation at 370C. The hybridization mixture was then added 
to the prehybridlzed lymphocyte metaphase slides. Hybridization was carried out under a 
coverslip in a moist chamber for 3-4 days at 370C. 

Immunofluoresccnt Probe Detection: 

The slides were washed three times in 50% formamide/ 2XSSC, pH 7, twice in 
2XSSC and once in O.LXSSC for 10 minutes each at 450C. After washing, the slides 
were immunocytochemically stained at room temperature in three steps (30-45 minutes 
each). Before the first immunocytochemical staining, the slides were preblocked in 1% 
BSA/4XSSC for 5 minutes. The first staining step consisted of 2 ug/mi Texas Red- 
Avidin [Vector Laboratories, Inc., Burlingame, CA (USA)] in 1% BSA/4XSSC. The 
slides were then washed in 4XSSC, 4XSSC/0.1% Triton X-100, 4XSSC, and PN (a 
mixture of 0.1 M NaH2po4 and 0.1 M Na2HP04, pH 8, and 0.1% Nonidet P-40) for 10 
minutes each and preblocked with PNM (5% Carnation dry milk, 0.02% Na-azide in PN. 
buffer) for 5 minutes. The second antibody incubation consisted of 2 ug/ml FITC- 
conjugated sheep anti-digoxigenin [Boehringer Mannheim GMBH, Indianapolis, IN 
(USA)] and 5 ug/mi anti-avidin [Vector Laboratories, Burlingame, CA (USA)] in PNM 
followed by three PN washes, 10 minutes each. After the PNM block, the third 
immunochemical staining was done using rabbit anti-sheep FITC antibody (1:50 dilution) 
(Vector Laboratories) and 2 ug/ml Texas Red-Avidin in PNM. After three PN washes, 
nuclei were counterstained with 0.8 uM 4,5-diamino-2-phenylindole (DAPI) in an antifade 
solution. 
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Fluorescence Microscopy and Interpretation of Results: 

A Nikon fluorescence microscope [Nikon Inc., Garden City, NY (USA)] 
equipped with a double band pass filter (Chroma Technology, Brattleboro, VT (USA)] 
and a 10OX objective was used for simultaneous visualization of the FITC and Texas Red 
signals. Hybridization of the breast cancer cell line DNAs was seen as a more or less 
uniform faint green background staining of all metaphase chromosomes with the exception 
of the Y-chromosome. As the breast cancer cell lines are of course of female origin, they 
did not contain Y chromosomal DNA. The absence of said green staining of the Y 
chromosome of the metaphase spread (seen in Figure 8) is exemplary of the manner in 
which a cytogenetically significant deletion would be visualized. Using a fluorescence 
microscope, amplified sequences can be seen as bright green dots or bands along the 
chromosome arms. 

To facilitate the display of the results and to improve the sensitivity of detecting 
small differences in fluorescence intensity, a digital image analysis system (QUIPS) was 
used. QUIPS (an acronym for quantitative image processing system) is an automated 
image analysis system based on a standard Nikon Microphot SA [Nikon Inc., Garden 
City, NY (USA)] fluorescence microscope equipped with an automated stage, focus 
control and filterwheel [Ludl Electronic Products Ltd., Hawthorne, NY (USA)]. The 
filterwheel is mounted in the fluorescence excitation path of the microscope for selection 
of the excitation wavelength. Special filters [Chroma Technology, Brattleboro, VT (iS A)] 
in the dichroic block allow excitation of multiple dyes without image registration shift. 
The microscope has two camera ports, one of which has an intensified CCD camera 
[Quantex Corp., Sunnyvale, CA (USA)] for sensitive high-speed video image display 
which is used for finding interesting areas on a slide as well as for focusing. The other 
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camera port has a cooled CCD camera [model 200 by Photometries Ltd., Tucson, AZ 
(USA)] which is used for the actual image acquisition at high resolution and sensitivity. 

The cooled CCD camera is interfaced to a SUN 4/330 workstation [SUN 
Microsystems he, Mountain View, CA (USA)] through a VME bus. The entire 
acquisition of multicolor images is controlled using an image processing software 
package SCIL-Image [Delft Centre for Image Processing, Delft, Netherlands]. Other 
options for controlling the cameras, stage, focus and filterwheel as well as special 
programs for the acquisition and display of multicolor images were developed at the 
Division of Molecular Cytometry [University of California, Medical Center; San 
Francisco, CA (USA)] based on the SCIL-Image package. 

To display the results of the comparative hybridization, two or three 
consecutive images were acquired (DAPI, FITC and Texas Red) and superimposed. The 
FITC image was displayed after using the thresholding and contrast enhancement options 
of the SCIL-Image software. Exercising such options reduces the overall chromosomal 
fluorescence to make amplified sequences more readily visible. For example, using 
thresholding and contrast stretching, it was possible to enhance the contrast and 
quantification between the faint green background staining and staining originating from 
the amplified sequences in the cell lines. Alternatively, to facilitate the detection of 
deletions, it is possible to increase the overall chromosomal fluorescence and make areas 
of reduced fluorescence appear darker. The red color was used for reference probes to 
help in the identification of chromosomes. 

After identification-of the chromosomes based on the use of reference probes in 
a dual-color hybridization, a site of amplification was localized by fractional length 
measurements along the chromosome arm (fractional length distance of the hybridization 
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signal from the p-telomere divided by the total length of the chromosome). The band 
location of the signal was then approximated from the fractional length estimate based on 
the ISCN 1985 ideograms [Harnden and Klinger, An International System for Cytogenetic 
Nomenclature . Karger Ag, Basel, Switzerland (1985)]. 

Results: 

The results from the hybridizations are compiled in Table 2 along with other 
information known about the cell lines. Amplification at 17ql2 (erbB-2 locus) and 
approximately 8q24 (MYC locus) was seen in lines showing amplification of erbB-2 and 
MYC whenever the level of amplification was greater than about five- to ten-fold using 
this CRCC method. In addition, amplification of several megabase wide regions was seen 
in three cell lines at 17q2223 and in three lines at 20qter; those amplifications were 
previously unknown sites of amplification and were not expected from other studies. AH 
lines showing amplification showed amplification at more than one site. Evidence for co- 
amplification may be clinically important since co-amplification has been observed 
previously [van de Vijver et al., Mol. Cell Biol . Z: 2019-2023 (1987); SaintRuf et al., 
Oncogene. 6 : 403-406 (1991)3, and is sometimes associated with poor prognosis [Borg et 
al., Br. J. Cancer . 12: 136-142 (1991)]. Amplification at 17q22-23 has also been seen 
using probe DNA from primary tumors. 



-72 - 



TABLE 2 



Results of Testing Breast Cancer 
Cell Lines for Amplification 

5 





Cell Line 


Origin 


Growth 
rate; -TD 


Hormone 

receptor 

E/P 


Known 
amplifica- 
tion (level) 


Amplifica- 
tion 

detected by 
CGH 




BT-474 


Primary 

cancer 


48-96 hr 


+/- 


erbB-2 


17ql2 
ferbB-2^ 
17q22-23, 
20qter 


i: "A' 

IB 

H 


SK-BR-3 


PL Effusion 


? 


? 


erbB-2 (9X) 
MYC (10X) 


17ql2 

(erbB-2), 

8q21, 

8q23-24.1 

(MYC), 

20qt34 


10 1; 


MDA-MB- 

361 > 


Brain met. 


<96 hr 


-/+ 


erbB-2 (4X) 


17q-22-23 




MCF-7 


PL Effusion 


<48 hr 


+/+ 


erbB-2 
(none) 


17q22-23, 
20qter 




T-47D 


PL Effusion 


? 


+/+ 


erbB-2 
(none 


None 




600 MPE 


PL Effusion 


? 


? 


erbB-2 
(none) 


None 


15 


MDA-MB- 
468 


PL Effusion 


? 


7 


erbB-2 
(none) 


None 



Esawpte 2 

Hybridizations with two different labeled subject DNAs as schematically 
20 outlined in Figures 6 and 7 were performed. One of the labeled subject DNAs hybridized 
was a cell line DNA as described in Example 1 and similarly labeled. The other labeled 
subject DNA was human genomic DNA labeled with biotin-14-dATP. 
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The protocols were essentially the same as in Example 1 except that no 
chromosome-specific reference probes were used, and the same amount of the labeled 
human DNA as the labeled cell line DNA, that is, 60 ng, was hybridized. Of course, 
reference probes could be added to the hybridization mixture, but they need to be 
differently labeled to be distinguishable. 

The results showed the normal DNA with a red signal and the cell line DNA 
with a green signal. The green to red ratios were determined along each chromosome. 
Amplification was indicated by an area where the signal was predominantly green 
whereas deletions were indicated by more red signals than in other areas of the 
chromosomes. 

Exemplary, CGH results using breast cancer cell line 60OMPE DNA and 
normal human DNA were as follows. As indicated above, the hybridization was 
performed using 5 ug Cot-1 DNA, 60 ng of digoxigenin labeled 60OMPE cell line DNA, 
and 60 ng of biotinylated normal human genomic DNA. The 60OMPE DNA was 
detected with FITC (green) and the genomic DNA with Texas Red-Avidin (red). 

The 60OMPE breast cancer cell line, the karyotype for which was published 
by Smith et aL, JNCI. 78 : 611-615 (1987), contains one normal chromosome 1 and three 
marker chromosomes with chromosome 1 material in them: t(lq:13q), ip(p22) and 
inv(l)(p36q21). Thus, the cell line is disomic for the p-telomere-p22, trisomic for p22- 
centromere and tetrasomic for the q-arm of chromosome 1. An idiogram of chromosome 
1 showing those different areas is illustrated in Figure 9. 

The comparative genomic hybridizations of this example apparently identified 
three different regions on chromosome I that could be separated according to the 
intensities of green and red colors. The q-arm of chromosome 1 had the highest intensity 
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of green color (tumor DNA). The region from band p22 to the centromere was the 
second brightest in green, and the area from the p-telomere to band p22 had the highest 
intensity of red color (normal DNA). Those hybridization results were consistent with 
the traditional cytogenetic analyses of that cell line stated immediately above. 
5 However, further studies with CGH, as presented in Example 3, indicated 

that CGH analysis of Example 2, as well as the published karyotype, were partially in 
error. The CGH analysis of Example 3 motivated additional confirmatory experiments, 
as described therein, leading to correction of the original CGH results and the published 
n \ karyotype. 

10 K 

^ Example 3 

r Jt Copy Number Karyotypes of Tumor DNA 

p In the representative experiments of CGH in this example, biotinylated total 

tumor DNA (cell line and primary tumor DNA) and digoxigenin-labeled normal human 
15 genomic DNA are simultaneously hybridized to normal human metaphase spreads in the 
presence of unlabeled blocking DNA containing high-copy repetitive sequences, 
specifically unlabeled Cot-1 blocking DNA [BRL, Gaithersburg, MD (USA)]. The 
following paragraphs detail the procedures used for the representative CGH experiments 
of this example, 

20 

PNA Labeling: 

DNAs used in this example were labeled essentially as shown above in 
Example 1. DNAs were labeled with biotinl4-dATP or digoxigenin-ll-dUTP by nick 
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translation [Rigby et aL, supra: Sambrook et al., supra] . The optimal size for double 
stranded probe fragments after labeling was 600-1000 bp. 

Pretreatment of Metaphase Spreading: 

Lymphocyte metaphase preparations were denatured, dehydrated and air 
dried, treated with Proteinase K and dehydrated again as described in Example L 

Comparative Genomic Hybridization: 

Sixty ng of biotinylated test DNA, 60 ng of digoxigenin-labeled normal DNA 
and 5 gg of unlabeled Cot-1 DNA (BRL) were ethanol precipitated and dissolved in 10 pi 
of 50% formamide, 10% dextran sulfate, 2xSSC 5 pH 7. The probe mixture was denatured 
at 700C for 5 minutes, allowed to reanneal at 370C for 60 minutes and hybridized to 
normal male metaphase chromosomes for 3-4 days at 370C, 

Immynpflycresrent Profo Prtwfon; 

The slides were washed as described above in Example 1, and 
immunocytochemically stained at room temperature in three thirty-minute steps: (I) 5 
gg/ml FITCAvidin [Vector Laboratories, Inc., Burlingame, CA (USA)] and 2 pg/ml anti- 
digoxigenin-Rhodamine (Boehringer Mannheim GMBH); (II) 5 }ig/wl anti-avidin (Vector 
Laboratories); and (HI) 5 jtg/ml FTTC-avidin. Nuclei were counterstained with 
0.8 fiM 4,5-diamino-2-phenylindole (DAPI) in antifade solution. A Zeiss fluorescence 
microscope equipped with a double band pass filter [Chroma Technology, Brattleboro, 
VT (USA)] was used for simultaneous visualization of HTC and rhodamine signals. 
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Digital Ima%e Analysis Svstern and F norescence Ratio Profiles 

The QUIPS system essentially a described above in Example 1 was used to 
analyze quantitatively the fluorescence signals. Fluorescence ratio profiles along the 
chromosomes were extracted using WOOLZ software package [developed at MRC, 
Edinburgh, Scotland] as follows: the DAPI image is used to set the morphological 
boundary of each chromosome by thresholding. The chromosome outline is smoothed by 
a n number of opening and closing operations, a modified Hilditch skeleton is calculated 
and taken to represent the medial axis of the chromosome. The DAPI image is expanded 
outwards in all directions until the intensity field levels off (when background is reached) 
or begins to rise (due to an adjacent chromosome). The intensity profile of each image 
along the medial axis and within the expanded DAPI image is then calculated by summing 
the green and red fluorescence pixel values along the sequence of lines perpendicular to 
and spaced at unit distance along the medial axis. Modal green and red intensity values 
corresponding to the expanded DAPI image are taken to represent the background 
fluorescence and used as the intensity origin. 
Cell Lines: 



5637 - Originated from a human primary bladder carcinoma; obtained 

from ATCC, catalog # HTB 9 

SK-BR-3 - Originated from a human metastatic breast adenocarcinoma, 

derived from a pleural effusion; obtained from the ATCC, catalog 
# HTB 30 

Colo 205 - originated from a human colon adenocarcinoma; obtained from the 

ATCC, catalog # CCL 222 

NCI-H508 - originated from a human cecum adenocarcinoma; obtained from 

the ATCC, catalog # CCL 253 
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SW480 
SW620 
WiDr 

SK-N-MC - 

CaLu3 

CaLu6 

NCI-H69 - 
COLO 320HSR 



Originated from a human colon adenocarcinoma; obtained from 
the ATCC, catalog # CCL 228 

Originated from a human lymph node metatasis of a colon 
adenocarcinoma; obtained from the ATCC, catalog # CCL 227 

originated from a human colon adenocarcinoma; obtained from 
the ATCC, catalog # CCL 218 

Originated from a human neuroblastoma (metastasis to supra-orbital 
area); obtained from the ATCC, catalog # HTB 10 

Originated from a human lung adenocarcinoma, derived from a 
pleural effusion; obtained from the ATCC, catalog # HTB 55 

originated from a human anaplastic carcinoma, probably lung; 
obtained from the ATCC, catalog # HTB 56 

Originated from a human small cell lung carcinoma; obtained from 
the ATCC, catalog # HTB 119 

originated from a human colon adenocarcinoma; obtained from the 
ATCC, catalog # 220.1 



600 PE 



BT-20 



Originated from a human breast carcinoma; obtained from Dr. 
Helene Smith and Dr. Ling Chen [Geraldine Brush Cancer 
Research Center, San Francisco, CA (USA)]. This is the same as 
the 600 MPE cell line described in Examples 1 and 2. 

originated from a human breast carcinoma; obtained from ATCC, 
catalog # HTB 19 



The following are five fibroblast cell lines with total chromosomal number 
and X chromosomal number in parentheses, which were obtained from the NIGMS 
repository [Camden, NJ (USA)]: 

GM01723 (45.XO) 

GM08399 (46.XX) 

GM04626 (47.XXX) 

GM01415E (48,XXXX) 

GM05009B (49,XXXXX). 
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Bruits and Discussion: 

Demonstrated herein is CGH's capability of detecting and mapping relative 
DNA sequence copy number between genomes. A comparison of DNAs from malignant 
and normal cells permits the generation of a "copy number karyotype" for a tumor, 
thereby identifying regions of gain or loss of DNA. 

Demonstrated is the use of dual color fluorescence in situ hybridization of 
differently labeled DNAs from a subject tumor genome and a normal human genome to a 
normal human metaphase spread to map DNA sequence copy number throughout the 
tumor genome being tested. Regions of gain or loss of DNA sequences, such as 
deletions, duplications or amplifications, are seen as changes in the ratio of the intensities 
of the two fluorochromes (used in this representative example) along the target 
chromosomes. Analysis of tumor cell lines and primary bladder tumors identified 16 
different regions of amplification, many in loci not previously known to be amplified. 
Those results are shown in Table 3 below. 

The tumor DNA is detected with the green fluorescing FITC-avidin, and the 
normal DNA with the red fluorescing rhodamine anti-digoxigenin. The relative amounts 
of tumor and normal DNA bound at a given chromosomal locus are dependent on the 
relative abundance of those sequences in the two DNA samples, and can be quantitated by 
measurement of the ratio of green to red fluorescence. The normal DNA in this example 
serves as a control for local variations in the ability to hybridize to target chromosomes. 
Thus, gene amplification or chromosomal duplication in the tumor DNA produces an 
elevated green-tored ratio, and deletions or chromosomal loss cause a reduced ratio. The 
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Cot-1 DNA included in the hybridization inhibits binding of the labeled DNAs to the 
centromeric and heterochromatic regions so those regions are excluded from the analysis. 

The fluorescence signals were quantitatively analyzed by means of a digital 
image analysis system as described above. A software program integrated the green and 
red fluorescence intensities in strips orthogonal to the chromosomal axis, subtracted local 
background, and calculated intensity profiles for both colors and the greento-red ratio 
along the chromosomes. 

The ability of CGH to quantitate changes in sequence copy number that affect 
an entire chromosome was tested with the above-listed five fibroblast cell lines having 1 
to 5 copies of the X chromosome and two copies of each autosome. Hybridization of 
DNA from the 45,XO cell line (in green) together with normal female DNA (in red) 
resulted in a uniform green-red staining of the autosomes whereas the X chromosome 
appeared more red (Figure 10A). Hybridizations with DNA from cell lines carrying 2, 3, 
4 or 5 copies of the X chromosome resulted in an increasingly strong green fluorescence 
from the X chromosome in relation to the autosomes. The average green-to-red 
fluorescence ratio of the X chromosome (Figure 10B), when normalized to the average 
ratio for the autosomes within the same metaphase spread, increased linearly with the 
increasing number of X chromosomes [correlation coefficient (r) = 0.978]. Thus, CGH 
can quantitatively distinguish a change of plus or minus one copy of a chromosome at 
least up to 4 copies. 

Experiments showed that CGH could generate a complete copy number 
karyotype for a near-diploid breast cancer cell line, 60OPE. According to the published 
karyotype for 60OPE (Smith et al., iNCI. 78: 611 (1987)], 60OPE is near-diploid with 
five marker chromosomes having four copies of the q-arm of chromosome 1, monosomy 
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16, and deletions of 9p, liq and 17p. CGH using biotinylatel 60OPE DNA (in green) and 
normal digoxigenin-labeled DNA (in red) revealed the following relative copy number 
changes: gain of iq and loss of 9p, 16q, 17p and distal llq. The greemo-red ratio 
profiles for those aberrant chromosomes are shown in Figure 11. only the q-arm of 
5 chromosome 16 showed decreased relative copy number suggesting that 16p was not 
deleted. That observation was subsequently confirmed by fluorescence in situ 
hybridization (FISH) to 60OPE interphase cells using cosmid probes for the p- and q-arms 
y:i of chromosome 16 [16p and 16q cosmid probes provided by Los Alamos National 
PJ Laboratory, Los Alamos, NM (USA)]; two signals per nucleus for the 16p cosmid probe 
10 |J and one for the 16q cosmid probe permitted calibration of a green- to-red ratio of 1.0 as 
rj indicating two copies of a sequence. 

fy Thus, if the absolute copy number of any point in the tumor genome is 

Q known, relative copy numbers can be converted to actual copy numbers at all loci. The 
CGH results differed from the originally published karyotype in the region of 16p and 
15 proximal lp. That discrepancy was resolved by locus-specific chromosome-specific 

painting (FISH) that indicated that the components of one of the marker chromosomes had 
been misinterpreted by conventional cytogenetic analysis, 

CGH with DNAs from two fibroblast cell lines [GM05877 and GM01142A 
from the NIGMS repository] detected small interstitial deletions around the RBI locus in 
20 13q-del(13) (pter > ql4.1::q21.2 > qter) and del(13) (pter > ql4.1::q22.1 > qter). 

On the basis of the CGH analysis and measurement of the deletion size as a fraction of 
the length of chromosome 13 (total length ill Mb), those deletions were estimated to span 
about 10 and 20 megabases (Mb), respectively. Thus it is possible that CGH can be used 
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to screen DNA samples from solid tumors in order to identify large physical deletions 
that may uncover recessive mutant tumor suppressor genes. 

CGH was evaluated for its ability to detect increased gene copy number with 
cell lines that contained previously reported amplification of oncogenes. Figure 12A 
shows CGH with DNA from a colon cancer cell line COLO 320HSR, known to contain 
more than a 50-fold amplification of a 300 kb region around the myc oncogene [Kinzku et 
ah, PNAS (USA), S2l 1031 (1986)]. The expected high green-tored ratio at 8q24 
corresponding to the location of myc is clear. The height of the peak does not 
quantitatively reflect the level of amplification because the fluorescent signal spread over 
a region of the chromosome that is larger than the length of the amplicon. That is 
apparently a result of the complex organization of the target DNA in the denatured 
chromosomes. 

The eight-fold amplification of the erbB2 oncogene in the SK-BR-3 breast 
cancer cell line also was detectable with CGH as a hybridization signal at 17ql2 (Table 
3). High level amplifications such as those also could be detected in single color- 
hybridizations with the use of only labeled tumor DNA. 

Cytogenetic and molecular studies of primary tumors and cell lines often 
reveal homogeneously staining regions and double minute chromosomes that do not 
involve known oncogenes [Saint-Ruf et al., Genes Chro m. Cancer.. 2: 18 (1990); 
Bruderlein et aL, Genes Chrom. Cancer. 2: 63 (1990)]. CGH allows straightforward 
detection and mapping of such sequences. Table 3 contains a summary of the analysis 
with CGH of 11 cancer cell lines. Data in Table 3 is based on the visual inspection of a 
large number of metaphase spreads and on detailed digital image analysis of four to six 
metaphases for each sample. 
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TABLE 3 

Mapping of amplified sequences in established cancer 
cell lines and primary tumors by CGH 



10 



Specimen 



Cell lines: 



tumors: 



40 



UR140 
UR145 



Origin 



Ampiif. by CGH* 



Cytogenetic 
evidence of 
gene ampli + 



5637 


Bladder 


3p25, 6p22 


DM 


p SK-BR-3 


Breast 


8q24 (mvc), 8q21, 
17ql2 (erbB2), 
20ql3 




E! Colo 205 


VUlwi Cv LAI 






O NCI-H508 


Colorectal 


14ql2-13 


DM 


W SW480 


Colorectal 


8q24 (my?) 


DM 


g SW620 


Colorectal 


16q21-23 


HSR 


WiDr 


Colorectal 


8q23-24 (my?) 




SK-N-MC 


Neuroblastoma 


8q24 (mvc) 


DM 


CaLu3 


Small cell lung 


8pl2-21, 8qtel, 
17ql2 (s£B2) 


HSR 


CaLu6 


Small cell lung 


13q32-34 




NCI-H69 


Small cell lung 


2p24 fN-mvc). 
2p21, 2q21 





Bladder carcinoma 16q21-22 
Bladder carcinoma 6p22 



45 



* The oncogene most likely involved in this amplification is shown in parentheses. 
+ Cytogenetic information based on the ATCC Catalogue of Cell Lines & Hybridomas 
(1992). 

DM = double minute chromosomes, HSR = homogeneously staining regions. 
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Sixteen amplified loci were mapped, many at regions of the genome where 
amplification had not previously been suspected. Thus, a large variety of genes may be 
amplified during cancer initiation and progression. In five of the 11 cell lines, more than 
one locus was amplified. Two or three separate loci on the same chromosome were 
5 amplified in four cell lines, which suggests a spatial clustering of chromosomal locations 
that undergo DNA amplification (Table 3 and Figure 12 A). 

CGH was also applied to identify and map amplified DNA sequences in 
:g uncultured primary bladder tumors, of the seven tumors tested, two showed evidence of 
M- DNA amplification but the loci were not the same (Table 3). Thus, a number of 
10 previously unsuspected genomic regions that might contain genes important for cancer 
^ progression have been identified by CGH. Further studies will elucidate which of those 
%\ loci contain novel oncogenes and which represent coincidental, random DNA 
Hp amplification characteristic of genomic instability. 

^ The detection and mapping of unknown amplified sequences that typically span 

15 several hundred kilobases (kb) to a few Mb demonstrated the usefulness of CGH for rapid 
identification of regions of the genome that may contain oncogenes. Analogously, 
detection of deletions may facilitate identification of regions that contain tumor suppressor 

Further studies are necessary to establish to what extent allelic losses in tumors 
are caused by physical deletions. In clinical specimens, the detection of small copy 
20 number differences is more difficult than with cell lines because of the admixture of DNA 
from contaminating normal cells and because of intratumor heterogeneity. As indicated 
above, using PGR to prepare tumor DNA from a small number of tumor cells (as a tumor 
clonal sub-population) may assist in resolving that problem. Like RFLP, CGH 
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emphasizes the detection of aberrations that are homogeneous in a cell population and 
averages those that are heterogeneous* 

At the current stage of development of CGH, sensitivity is primarily limited by 
the granularity of the hybridization signals in the metaphase chromosomes. Further 
improvements in sensitivity will be achieved by optimization of the probe concentration 
and labeling, and by the averaging of the green-to-red fluorescence ratios from several 
metaphase spreads. 
Example 4 

In the present study, we have used CGH to identify and map increases in DNA 
sequence copy number in 15 breast cancer cell lines and 33 uncultured primary breast 
tumors. 

MATERIALS AND METHODS 

DNA samples . Fifteen breast cancer cell lines (BT-20, BT-474, BT-483, 
MCF7, MDA-157, MDA-175, MDA-231, MDA-330, MDA-361, MDA-435, MDA-436, 
MDA-453, SK-BR-3, ZR-75-1, ZR-75-30) were obtained from American Type Culture 
Collection (Rockville, Maryland). The cells were grown in the recommended culture 
conditions in 75 cm 2 flasks until confluent. The trypsinized cells were suspended in a 
digestion buffer (0.1 mg/ml proteinases K, 100 mM NaCl, 10 mM Tris-Cl pH 8, 25 mM 
EDTA pH 8, 0.5% sodium dodecyl sulfate) and were incubated with shaking at 50°C 
overnight. High-molecular weight DNA was extracted using PhenolChloroform-isoamyl 
alcohol and precipitated with 7.5 M ammonium acetate and 100% ethanoL DNA also 
isolated from thirty-three primary breast carcinomas obtained prospectively at surgery. 
Thirty of the carcinomas were ductal invasive, 1 was intraductal and 2 lobular. The post- 
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operative TNM-stage distribution was state I (6 cases), stage Ha (13 cases), stage lib (8 
cases), stage H[ (3 cases), stage IV (1 case) and unknown (1 case). DNA was also 
isolated from the peripheral blood of 7 normal healthy individuals. One of these was 
used as the normal reference DNA is all CGH hybridizations. 
5 Comparative Genomic Hybridization . The target metaphase slides were 

prepared from PHA-stimulated peripheral blood lymphocytes from a normal male. To 
assess the hybridization characteristics, each batch of slides was extensively tested with 
% labeled normal genomic DNA and with whole-chromosome painting probes. If evidence 
y ; of dim or nonuniform hybridization was detected, the entire batch of slides was 
10 ffi abandoned and a new batch was prepared. 

w CGH was performed essentially as described above. Briefly, DNA samples 

were labeled either with biotin-14-dATP (test samples) or digoxigenin-ll-dUTP (normal 
J: reference DNA) using the Bionick Labeling Systems (BRL, Gaithersburgh MD). The 
N j amount of DNAse and DNA polymerase I was adjusted so that the probe fragment size 
15 distribution after labeling was 600-2000 bps (a smear in a non-denaturing agarose gel). 
Probe fragments of this size were necessary to obtain uniform, intense hybridization. 
Sixty to 100 ng of each of the labeled probes and 5 /xg of unlabeled Cot-1 DNA were 
precipitated with ethanol. The DNAs were dissolved in 10 pi of hybridization buffer 
(50% formamide, 10% dextran sulfate, 2xSSC, pH 7). Metaphase slides were denatured 
20 in 70% formamide, 2xSSC (pH 7) at 70°C for 3 minutes, dehydrated in 70%, 85% and 

100% ethanol, treated with proteinase K (0.1 pg/ml in 20 mM Tris, 2 mM CaCl, pH 7.5) 
at 37°C for 7.5 min and dehydrated again. The hybridization mixture was applied on 
slides and hybridized for 2-3 days at 37°C in a moist chamber. 
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After hybridization, the slides were washed three times in wash buffer (50% 
fonnamide, 2xSSC, pH 7), twice in 2xSSC and once in O.lxSSC at 45°C for 10 min 
each. Biotinylated DNA was detected with 5/ig/ml Avidin-FITC (green fluorescence) and 
digoxigenin-labeled DNA with 1/zg/ml anti-digoxigenin Rhodamine (red fluorescence). 
Only one layer of immunoreagents was used as this was found to reduce noise and 
provide a more uniform fluorescence signal. After staining, the slides were washed in 
4xSSC, 4xSSC/0.1% Triton-X, and 4xSSC for 10 min each. Samples were 
counterstained with 4,6-diamidino-2-phenylindole (DAPI) in a antifade solution. 

» Digital image analysis . The hybridizations were analyzed using a digital image 
analysis system that was based on either a Nikon SA or Zeiss Axioplan microscope 
equipped with a cooled CCD camera (Photometries Inc. Tucson, AZ), and a filter system 
consisting of a tripleband pass beam splitter and emission filters. Excitation of each 
fluorochrome was accomplished using single band pass excitation filters in a computer 
controlled filter wheel. This made it possible to collect sequential, properly registered 
images of the three fluorochromes (DAPI, FITC, and Rhodamine). The three-color 
images were processed with a Sun IPX workstation using Scil-Image software (TNO, 
Delft, Netherlands) for pseudo-color display. Contrast-stretched three-color images were 
used to visually inspect the color change along the metaphase chromosomes. A contrast- 
stretched single color image of the DAPI counterstain was used to identify chromosomes 
and assign copy number changes to individual chromosomal bands. 

In addition to visual analysis of the digital images, a quantitative analysis of 
green and red fluorescence intensities was performed with Xwooiz software as described 
in Kallioniemi, A., Kallioniemi, O.-P., Sudar, D., Rutovitz, D., Gray, J.W., Waldman, 
F & Pinkel, D, (1992) Science 258, 818-821. Kallioniemi, O.-P., Kallioniemi, A M 
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Sudar, D. , Rutovitz, D., Gray, LW., Waldman, F. & Pinkel, D. (1993) Semin. Cancer 
Biol. 4, 41-46. The contour and medial axis of chromosomes were defined based on the 
DAPI counterstain. Local background fluorescence was determined for each chromosome 
and subtracted from the green and red images before analysis. Green and red 
fluorescence intensities were then determined along the chromosome from p-telomere to 
q-telomere by integrating fluorescence across the width of the chromosome perpendicular 
to the medial axis. Green and red fluorescence intensity ratio profiles were then 
calculated for each chromosome. All profiles were normalized so that the overall green 
to red ratio for the entire metaphase (within the segmented image of DAPI and red 
fluorescence) was set at 1.0. 

Interpretation of CGH images . Five metaphases from each hybridization were 
analyzed for the chromosomal locations of DNA sequence increases. These regions were 
determined using green to red fluorescence intensity ratio profiles and information gained 
during visual inspection of the digital images. Criteria used to define the increased DNA 
sequence copy number in tumors were based on comparisons of normal DNAs labeled 
and stained with two different colors. These included: green to red ratios that exceeded 
1.25 or small paired spots of green fluorescence clearly above the background level found 
in the normal vs. normal DNA comparisons (see below). High-level increases were 
defined as those chromosomal subregions where the green to red ratio exceeded 1.75. 
Increases that were not systematically present in all metaphases or that were seen only in 
one chromatid or in one of the two chromosome homologues were considered non- 
specific and were excluded from analysis. 

RESULTS 
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Interpretation of CGH data was guided by control experiments. Comparisons 
among seven normal DNA specimens were used to establish normal levels of green to red 
fluorescence intensity ratio variation along the length of all human chromosomes while 
cell lines with known amplifications were used to assess sensitivity. Four of 6 breast 
cancer cell lines with known ERBB2 amplification and 3 of 5 with known BCL1 
amplification showed evidence of increased copy number by CGH at 17ql2 and llql3, as 
expected. All high level amplifications (5-15X) were detected by CGH, while those of a 
lower level (2-5X) were missed. No false positive ERBB2 or BCL1 amplifications were 
seen. 

Twenty-eight (85%) of the 33 primary breast carcinomas showed evidence of 
increased DNA sequence copy number involving one or more regions of the genome 
(Figs. 14-16). Twenty-one cases (64%) showed gains of whole chromosomes or 
chromosome arms and 20 (61%) showed increases in copy number involving only a 
region of an arm. All 15 breast cancer cell lines showed copy number increases, 14 of 
them (93%) whole chromosome or chromosome arm gains and 14 (93%) region copy 
number increases (Fig. 16, Table 4). The average number of changes per specimen was 
much higher in cell lines (7.7 ± 3.3) than in primary tumors (3.3 ± 2.6). 

The most common copy number changes were remarkably similar in both 
primary tumors and cell lines. Gains of whole chromosome arms were most often found 
at Iq (36% of primary tumors/40% of cell lines) and at 8q (27%/40%), while regional 
copy number increases were seen repeatedly at 17q22-q24 (18%/67%) and at 20ql3 
(18%/40%). The region on chromosome 17 was distinctly distal to the ERBB2 gene 
locus and was amplified independently of it. Increases in copy number at 17q22-q24 and 
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20ql3 in BT-474, MCF7, and MDA-361 cell lines were validated using fluorescence in 
sim hybridization with specific probes to these loci* 

DNA sequence copy number increases were also found in a large number of 
other chromosomal regions, but at a lower frequency (Table 5). Overall, a total of 26 
loci (15 in primary tumors and 19 in cell lines) appeared to be significant as they were 
involved in either 1) high-level copy number increase of a small chromosomal segment, 
or 2) low-level regional increases that occurred in at least three primary tumors ( > 9 % of 
cases) or two (< 13%) cell lines. Usually, several regions were simultaneously amplified 
in a single specimen. 

The size distribution of the region involved in DNA sequence copy number 
increases was continuous, ranging from very small regions to whole arm or whole 
chromosome gains. 

DISCUSSION 

The present results on breast cancer show how CGH provides new information 
on 1) the overall frequency of DNA gains and amplifications, 2) the clustering of these 
changes to particular chromosomal subregions, and 3) the size and number of regions 
affected in individual tumors. This information also illustrates the advantages of CGH as 
compared to conventional molecular genetic methods that are usually restricted to the 
analysis of only one locus at a time. While cytogenetic analysis does provide a similar 
overview, it is limited by technical problems in preparing metaphase chromosomes from 
solid tumors, inability to determine the genomic origin of the amplified sequences (e.g., 
HSRs, DMs), as well as difficulties in the unambiguous identification of all changes in 
highly aberrant genomes. 
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Our results substantially extend the current knowledge of the frequency of DNA 
gains and the chromosomal regions involved in breast cancer. The frequently occurring 
whole arm gains of Iq and 8q have been previously reported in primary tumors by 
cytogenetic techniques, but most of the regional copy number changes, e.g. at 17q22-q24 
and 20ql3, have not. Loci previously known to be amplified in breast cancer (8pl2, 
8q24, llql3, 15q25, and 17ql2) accounted for only 22% of all amplification events 
(subregional DNA sequence copy number increases)in the primary tumors and 18% of 
those in the cell lines. This illustrates the fact that studies limited to known oncogene loci 
underestimate the level of genetic instability in breast cancer and provide a restricted view 
of the regions involved. 

17q22-q24 and 20ql3 emerged as major new regions of amplification in breast 
cancer. It is believed that these are loci where previously unknown genes important in 
breast cancer progression can be found. The region on 17q is telomeric to the ERBB2 
and BRCA1 genes and is amplified independently of ERBB2. Based on CGH analysis, 
this region appears fairly large and may span may different genes (e.g. HOX2, NGFR, 
WNT3, GH1, GH2, PRKCA, HLR1, NME1). The region in 20ql3 also contains several 
known genes, including SRC, ADA, RPN2, GNAS1 and ZNF8. Further detailed studies 
on 17q and 20q are necessary to determine which, if any, of these genes are located in 
the minimal common region of amplification and expressed in breast cancer cells. 

The ability of CGH to detect amplification is dependent on the level of 
amplification and the size of the region affected. While we have been able to detect more 
than 5-7 fold amplification of oncogenes in homogeneous cell lines (Kallioniemi, A., 
Kallioniemi, O.-P., Sudar, D., Rutovitz, D M Gray, J.W., Waldman, F & Pinkel, D, 
(1992) Science 258, 818-821. Kallioniemi, O.-P., Kallioniemi, A M Sudar, D., Rutovitz, 
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D., Gray, LW., Waldman, F. Sc Pinkel, D. (1993) Semin. Cancer BioL 4, 41-46), the 
sensitivity of CGH in primary tumors is compromised by normal cell contamination and 
intratumor heterogeneity. The actual amplification frequencies, especially those involving 
small amplicons, may therefore be higher than reported here. Furthermore, amplification 
is only one of the mechanisms by which gene expression may be elevated in tumor cells. 
The same gene may be amplified in some tumors and unregulated by other means in other 
tumors. Thus, amplification frequency may not directly reflect the relative importance of 
the genes in the various chromosomal regions, and virtually all of the 26 different loci 
reported here may contain genes whose elevated expression contributes to tumor 
progression. 

CGH not only detects and maps gains of DNA sequences, but also provides an 
estimate of the size of the affected region. A new finding of this study was that the size 
distribution of the region involved in DNA gains was virtually continuous, starting from 
the high-level, amplifications of small regions (apparently ranging from a few hundred kbs 
to 1-2 Mbs and involving ERBB2, BCL1, and several previously unknown regions) to 
gains of whole chromosome arms. In addition to "classical gene amplification", of 
described in cell lines selected for drug resistance in vitro, where usually a single target 
gene is activated,, primary tumors more often show copy number increases of large 
regions (up to tens of Mbs) that may simultaneously affect the expression of many 
different genes. For example, several different regions of 8q (8q21, 8q22-q23 and 8q24) 
were separately amplified in a few specimens. It is possible that the very common gain 
of the entire long arm of chromosome 8 is selected for because it leads to a simultaneous 
increase of copy number of all these three regions. Studies of llq!3 amplification in 
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breast cancer and 12ql3q-ql4 amplification in sarcomas support the concept that a single 
amplicon may contain several expressed sequences. 

Many breast cancer cell lines and primary tumors showed simultaneous 
amplification of several distinct regions of the genome suggesting that breast cancer ceils 
are genetically very unstable. It appears likely that the amplification of several different 
genes actually contributes to growth advantage. However, co-amplification of an 
important oncogene in one locus and random DNA sequences from another locus cannot 
be excluded. 

Chromosomal regions amplified in cell lines tended to be smaller and apparently 
of higher copy number than those in the primary tumors. Based on in vitro drug 
resistance models, the size of the amplicon tends to become smaller with time. The small 
size of the amplified regions in cell lines may thereby reflect the more advanced state of 
the amplification process. The more advanced genetic composition of breast cancer cell 
lines is also evident from the much higher average number of DNA copy number changes 
per specimen as compared with primary tumors. However, the chromosomal regions 
involved were remarkably similar. Most of the highly prevalent changes, such as copy 
number increases of 8q24, llql3, 17q22-q24 and 20ql3, as well as gains of lq, 8q and 
20q were found frequently both in primary tumors and cell lines. These findings indicate 
that most of the genetic aberrations present in highly-evolved cell lines are representative 
of those seen in uncultured primary tumors and that cell lines are representative of those 
seen in uncultured primary tumors and that cell lines provide a valuable resource for 
more detailed mapping and isolation of genes implicated in breast cancer. 
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TABLE 4. SUMMARY OF CHROMOSOMAL REGIONS INVOLVED IN DNA 
SEQUENCE COPY NUMBER INCREASES IN 15 BREAST CANCER CELL LINES 



Cell Line 



Region Copy Number 
Increases 



Whole-arm gains 



BT-20 



4q32-q34, 6q21-q22 



5p, 7p, lOp, 16q, 18p, 
20q 



BT-474 



17ql2, 17q22-q24, 
20ql3 



BT-483 



12q24 



lq, 8q, 19q, 20q 



MCF7 



lcen-q32, 3pl4, 8q21- 
qter, 15q21-qter, 16q23- 
q24, 17q22-q24, 20ql3 



5p, 12q, 14q 



MDA-157 



lq32-qter, 5q32-qter, 
13q31-qter, 14q24-qter, 
17q22-qter, 19ql3.1, 
19ql3.4, 20ql3 



2p, 7q, 8q, Xp 



-94- 



MDA-175 
MDA-231 
5 MDA-330 



10 [r MDA-361 



MDA-435 

15 

MDA^36 



20 MDA-453 
SK-BR-3 



llql3 



3p26-qter, 5q31-qter, 
7q22-q32, 8q21-q23, 
llpl5, llql3, 17ql2, 
17q21-qter, 20ql3.2-qter 

6cen-q21, 12q21.3-q23, 
12q24, 17q22-qter, 
19ql3.3-ql3.4, 20ql3 

6ql2-ql3 

3p22-pter, 5q31-qter, 
8q22-qter, 14q31-qter, 
17q22-qter, 20ql3 

lq31-qter, 3q26-qter, 
17q22-q24 

3p22-pter, 8q21, 8q23- 
q24.1, 10cen-q21, 



lq, 8q, 20q 
4p, 6p, llq, 19q 
5p, lOp, 14q 

5p, 8q, 12p, 16 

3pl4-qter, 8q, 20q 
lq, 5p, 16q, 21, 22 

8q, 14q, 20, 22 

lq, 7pter-q31, 16p, 20q 
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13q22-qter, 14q31, 
17ql2, 17q24-qter 

ZR - 75 " 1 "ql3, 12ql4-ql5, l q , 7p, 12p, 16p, 20q, 

17q22-qter 22 

ZR-75-30 8q23-qter, 17cen-q24 lq, 5p, 20p 



High-level, regional copy number increases (green to red ratios exceeding 1.75) are 
shown in bold. 
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TABLE 5. 



LOQ INVOLVED IN HIGH-LEVEL AMPLIFICATION OR 
IS FREQUENCY LOW-LEVEL DNA SEQUENCE COPY NUMBER 
INCREASES IN 33 PRIMARY BREAST TUMORS AND 15 BREAST 
CANCER CELL LINES 



Primary tvmpHj: lq32, 6p23-pter, 6cen-p21.2, 6ql2-ql3, 7p21, 7 cen-pl2, 8q22-q23, 
8q24, llql3, 12q21, 15q24, 15q26, 17q22-q24, 19ql3.3-qter 20ql3 



Cgll lines; lq32, 3p22-pter, 3pl4, 3q26-qter, 5q32-qter, 6ql2-ql3, 6q21, 8q21, 8q23- 
q24.1, llql3, 12q21.3-q23, 12q24, 13q31-qter, 14q31, 17ql2, 17q22-q24, 19ql3.1, 
19ql3.3-qter, 20ql3 



Loci involved in both primary tumors and cell lines are shown in bold. Whole 
chromosome or chromosome arm changes were excluded from analysis. 
Example 5 
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The CGH techique was carried out on a primary bladder cancer using the 
methodology set forth above. In particular, CGH was applied for the analysis of DNA 
gains and losses in transformed uroepithelial cell lines and primary bladder carcinomas. 
There were employed five isogeneic SV40-transformed uroepithelial cell lines (n=7) 
previously analyzed by conventional cytogenetics and 28 primary bladder carcinomas of 
varying stage and grade. 

The results are reported below: 



CGH RESULTS IN PRIMARY BLADDER CANCER (N = 28) 



10 OB 



Gains (common region) 



Deletions (common region) 



18% + 8q (q21) • 



36% -9q(q34) 



18% +13q (q31-qter). 



29% -lip (pl5) 



29% -llq (q23-qter) 



15 



25% -8p (p22-pter) 



25% -17p (p) 



21% -3p (p21) 



21% -9p(p) 



21% -10q(q26) 



20 



21% -12q (q23-qter) 



21% -16p (pl2-pter) 
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CORRELATION BETWEEN CYTOGENETICS AND CGH 
Tl high-grade cell line 



Cytogenetics 

7pl5-pter+ 

8q22-qter+ 

012+ 
u qter- 



llcen-pl3 + 



CGH 

7pl5-pter+ 
8q24-qter+ 
9ql3-qter+ 
6q21-23- 



Cytogenetics 
3p- 
4pl6- 
5p- 



CGH 

3p- 
4pl6- 

5P- 

6q21-q22 
15qll-ql3-15qll- 

19pl3.1-qter- 



y- 



90% agreement other cell lines 91-100% 
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ISOGENEIC SV40 - TRANSFORMED 
UROEPITHELIAL CELL LINES 



* All cell lines showed 5p-, 15q- and 9q+ 

* 8q23-q24 + = > tumorigenicity 

* 3p- — > high-grade 

* 8q21-qter - = > high copy no. 

Figure 16 shows the gains and losses of DNA sequences in primary bladder 
carcinomas. 
Example 6 

The regions of common abnormality in ovarian cancer were mapped, the role of 
these abnormalities in ovarian cancer progression investigated and the process of 
identifying the involved genes begun. This was accomplished through the combined 
analysis of loss of heterozygosity (LOH), comparative genomic hybridization (CGH) and 
fluorescence in situ hybridization (FISH). Initial work validated our preliminary results 
and enabled analysis of the chromosomal changes leading to the abnormalities. Next, 
specific abnormalities were associated with grade. Early abnormalities may be 
diagnostically important and later abnormalities may predict for metastatic disease and hus 
be useful in disease management. Finally, the manner in which the abnormalities 
revealed by these studies influences clinical and biological behavior was explored. 

The structural and function gene dosage abnormalities in ovarian cancer that 
result from events such as mitotic recombination leading to allelic loss, gene 
amplification, physical deletion, etc. were investigated using three techniques: LOH, 

• 100- 



FISH and CGH. CGH has been particularly useful since this technique allows genome 
wide mapping of regions of altered copy number (both increases and decreases from 
normal) in a single experiment without prior knowledge of the locations of regions of 
abnormality. Analysis of LOH complements CGH analysis since LOH detects allelic loss 
or imbalance due to mitotic recombination and loss plus duplication that would be missed 
by CGH. FISH has proved to be useful because it provides a direct measure of gene 
copy number in single cells and it permits detailed mapping of regions involved in copy 
number abnormalities. The number of physically mapped probes useful for FISH is 
already high as a byproduct of the human genome project. In addition, development of 
probes specifically for use with FISH is a central goal of the recently established 
LBL/UCSF Resource for Molecular Cytogenetics (Co-directed by the Principal 
Investigator of this application), FISH analyses also provided validation for the CGH 
studies. For analysis of regions with increased copy number, FISH provides information 
about the mechanism of amplification that may be therapeutically useful. For example, it 
has been suggested that amplification in the form of double minutes can be eliminated by 
treatment with agents such as a hydroxyurea. In addition, information about the structure 
of the amplicon may facilitate positional cloning. 

The CGH and LOH data for ovarian cancers are summarized by chromosome 
arm in Figure 17 for 3C Grade III tumors. LOH was assessed at 86 separate loci. The 
number of different abnormal sites is remarkably large (almost 1 1 different sites per 
tumor). Interestingly, the average number of different sites of abnormality per tumor in 
Grade I and Grade II was only about 3. These data reflect the dramatic genetic instability 
that develops as tumors progress and reveal the number of different abnormalities that 
occur. It is likely that these abnormalities contribute to tumor progression because of the 
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strong association between LOH and/or gene dosage decrease and tumor suppressor gene 
inactivation and between gene dosage increase and oncogene activation. Several regions 
of gene dosage abnormality occurred in regions of the genome that are not associated with 
oncogenes or tumor suppressor genes now known to be involved in ovarian cancer. 
Thus, these studies set the stage for identification of new genes that play a role in the 
progression of ovarian cancer. 

The concordance between LOH and CGH detection of loss was high 
in our study (-80% at all loci) indicating that physical deletion was the most frequency 
mechanism by which gene dosage abnormality occurred in ovarian cancers. However, 
the concordance between our data and those of Sato is not as high. This suggests a 
possible ethnic and/or environmental difference between ovarian cancers in the Japanese 
and American populations. 

LOSS OF HETEROZYGOSITY. LOH was assessed in 50 surgically removed 
ovarian cancers at 86 loci distributed over every chromosome arm (except for the short 
arms of the acrocentric chromosomes. LOH was assessed by Southern analysis of 
restriction fragment length polymorphisms (RFLP) or PCR analysis of simple sequence 
repeat (SSR) polymorphisms). Analyses were carried out using DNA from each tumor 
and a corresponding peripheral blood sample. Details of the procedure and a description 
of the probes used are described in Yang-Feng T.L.; Han H.; Chen K.C.; Li S.B.; Claus 
E.B.; Carcangiu M.L.; Chambers S.K.; Chambers J.T.; Schwarts P.E. (1993) Allelic 
loss in ovarian cancer. Int. J. Cancer 54:546-551. 
Yang-Feng et al; (65, Appendix 1), 

Of the 50 tumors analyzed, 45 were of epithelial origin (30 serous, 4 mucinous, 
6 endometroid, 4 mixed mullerian and one undifferentiated), 4 originated in the sex-chord 
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stroma and 1 originated in the germ line. The 45 epithelial cases included 2 benign, 3 
borderline, 4 Grade I, 5 Grade II and 3 1 Grade III or higher. The findings in this study 
are shown graphically by chromosome arm in Figure 17. Regions showing consistent 
LOH included 13q (21/50), 17p( 19/45), 17q (21/47) and Xp (18/44). Of course, not all 
5 loci were equally informative. For example, the locus with the most informative cases 
(i.e., cases showing two different alleles at the loci tested) is 13q while 8q and 12q were 
the regions with, the fewest informative cases. Thus, some sites of consistent LOH may 
,o have been missed. 

M; COMPARATIVE GENOMIC HYBRIDIZATION. CGH is particularly useful 

10 "f:] f° r analysis of solid tumors. In this procedure, dual color fluorescence in situ 

~~ hybridization is performed to normal metaphase spreads using differentially labeled DNA 
SI from the tumor and from a normal DNA sample. Unlabeled Cot-1 DNA is included in 
jF the hybridization mixture to competitively inhibit binding of repetitive sequences. DNA 
^ labeling, hybridization and probe detection is performed so that hybridized tumor DNA 
15 fluoresces green and hybridized normal DNA fluoresces red. The chromosomes also arc 
stained with DAPI to facilitate chromosome identification. A digital imaging microscope 
and supporting software developed in our laboratory is used to record three color images 
(red, green and blue) and to measure ratios of green to red fluorescence along the 
metaphase chromosomes. In practice, several different metaphase spreads are analyzed in 
20 each analysis, chromosome profiles are calculated and combined to show the mean 

greenrred ratio and standard deviation therein for each chromosome. At present, a region 
is considered to be increased (decreased) in copy number relative to normal when the 
lower (upper) standard deviation is greater (less) than the normal value. Thus, regions of 
physical gene dosage abnormality throughout the tumor genome are detected and mapped 
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in a single -experiment* Relative changes in copy number are converted to absolute copy 
number changes by FISH with a few probes to unique sequences in the tumor. Studies of 
cell lines (600 MPE and COLO320) carrying known regions of increased and decreased 
copy number suggest that CGH allows detection of 2- to 4-fold increases and decreases in 
5 copy number when the involved region is larger than 10-20 megabases. Detection of 
smaller regions of amplification has been achieved when the regions are amplified 5 to 
S 10-fold, 

jr; CGH was performed on 38 of the 50 samples analyzed for LOII as described 

2 above. Samples analyzed using CGH included 2 benign, 3 borderline, 1 Grade I, 2 
10 7 Grade II and 30 Grade m CGH analyses of 30 Grade m tumors are summarized in 
SI Figure 18. This figure illustrates the power of CGH for comprehensive analysis of gene 
+; dosage abnormality. Regions of increased copy number and decreased copy number are 
detected and the extent of the region of gene dosage abnormality is mapped. The number 
of changes per tumor was 11. Abnormalities occurring in more than 30% of the tumors 
15 analyzed are summarized in Table 6, An immediate finding is that tumors with 
amplification at 3q26 show 18 abnormalities per tumor while tumors that are not 
amplified at 3q26 show ~6 abnormalities per tumor. Thus, amplification at 3q26 seems 
to be associated with genetic instability. The data for the lower grade tumors are 
insufficient to allow discrimination between specific early and late lesions. However, 
20 they show only — 3 aberrations per tumor. Patients with these tumors have a good 

prognosis. Thus, it will be interesting to learn whether Grade n tumors without 3q26 
amplification behave clinically more like the lower grade tumors. 



TABLE 6. 

25 
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Regions of frequency amplification and deletions 
in ovarian cancers detected by cgh or loh 



5 


Chromosome Region Number of Cases 


CGH Frequency LOH 




Frequency 












Increased Copy No. 








3q26 


13 




42% 


10 


8q24 


11 




35% 






Decreased Copy No. 








... 

S 17qll.2-q21 




18 


58% 45% 


15 : 


S (21/47) 










i 17pl2-pl3 


17 




55% 42% 














0 Ip35-lp36.1 




14 


45%11% 




* (3/27)* 








20 J 


u Xp21 


13 




42%41% 




» (18/44) 










;1 13ql4-q21.1 




13 


42% 42% 




fi (21/50) 










C 8p21-p22 


10 




32%26% 


25 f 


b (5/19) 










* 16ql3-q23 


10 




32% 23% 




(5/22) 









The prot>» tesud vu at lp32. 

30 



Table 6 also compares regions of genetic abnormality detection by CGH with 
those detected by analysis of LOH, In general, regions found to be consistently reduced 
in relative copy number also showed LOH. In fact, the concordance between LOH and 

35 reduced copy number using CGH is approximately 80% at most loci. Chromosome 19 is 
an exception. We currently believe that this low concordance is an artifact of CGH that 
seems to be specific to chromosome 19. Future studies with FISH will address this issue. 
The high concordance overall is significant since the LOH and CGH studies were 
conducted in a blind fashion in different laboratories to minimize the possibility that 

40 interpretation of the CGH karyotypes was influenced by the LOH results. This 
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concordance suggests that the LOH in ovarian cancer is, for the most part, due to 
physical loss of one allele. This is important since it would mean that the regions of 
reduced copy number detected and defined by CGH can be used in the same was as 
information about LOH. That is, to indicate the locations of inactive tumor suppressor 
genes. This interpretation is supported by analyses of deletions of 17p and 16q in human 
breast cancer using CGH and FISH (see below) that guided efforts to positionally clone 
genes involved in tumor progression. 

Two other aspects of CGH and LOH data on gene dosage abnormality are also 
noteworthy: 1) No single site of gene dosage abnormality occurs in all tumors. Table 6, 
for example, shows that the most frequent events occur only in about 60% of all Grade 
HI, serous tumors. Thus, clinically homogenous tumors are genetically heterogeneous. 
2) Many of the imbalances are correlated within individual tumors. Strong correlations 
(p<.01) have been observed for the aberration pairs: Xp-;3q+;13q-;8q+;8p-;17p-;17q- 
;8q+_;13q-;17p-;Xp- and 13q-;8p-. These observations suggest a model for genetic 
progression like that illustrated in Figure 19 where the cancers progress along parallel 
pathways through the more-or-less serial accumulation of genetic aberrations, many of 
which confer the same phenotype. 

Corollaries of this model are that "parallel" genetic events that confer the same 
phenotype are likely to be uncorrected while "serial" genetic events are likely to be 
correlated (e.g., aberrations 1, 4 and 9 are likely to be correlated while aberrations 1, 2 
and 3 are not). 

FLUORESCENCE IN SITU HYBRIDIZATION. Characterization of gene 
dosage abnormality using CGH or by analysis of LOH provide information about events 
that occur in most or all cells in the tumor population. In addition, both are limited in 

- 106- 



resolution to ~ 10 Mb by the availability of polymorphic, mapped probes (for LOH) or 
by the organization of metaphase chromosomes (CGH). FISH with well mapped probes 
complements LOH and CGH by providing higher resolution mapping, information about 
heterogeneity, level of amplification information about mechanisms of amplification. We 
have performed several studies illustrating the use of FISH as an adjunct to CGH and 
LOH. 

Amplification: CGH studies suggest frequent amplification of sequences at 3q26 
and 8q24 and deletions of 3p. We applied dual color FISH with probes to these regions 
to confirm these findings and to provide more detailed information about the level of 
amplification. This experiment confirms the deletion of 3p and amplification at 3q. In 
addition, it extends the information available from CGH since it gives a more accurate 
estimate of the level of amplification. We alos carried out hybridization with CMYC to a 
tumor detected as amplified at 8q24 using CGH. The results confirm that CMYC is 
amplified in these tumors. We have extended these studies by applying FISH with 
CMYC to other ovarian tumors showing no amplification levels of CMYC amplification. 
These studies show substantial heterogeneity in the level of amplification among cells 
within each tumor as well as amplification in low frequency subpopulations in tumors that 
do not appear amplified by other techniques. A similar result was obtained during 
analysis of Her-2/neu amplification in breast cancers. Information about low frequency, 
genetically aberrant subpopulations may be useful in defining the biological importance of 
gene dosage abnormality since these subpopulations may confer advantage to the 
subpopulation that may with time, affect the biological and/or clinical behavior of the 
tumor. We also applied FISH with probes to CMYC to short term cultures of high grade 
ovarian tumors to determine whether amplification was intrachromosomal or 
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extrachromosomal. So far, most tumors show amplification as inter chromosomal 
suggesting that efforts to purge amplified sequences chemotherapeutically will be 
unsuccessful. However, additional work will be required to determine the chromosomal 
organisation of other amplified sequences and to determine whether chromosome 
integration occurs late in tumor development. 

Positional cloning: The utility of FISH as an adjunct to CGH and analysis of 
LOH for definition of regions of gene dosage abnormality is illustrated by our work 
defining regions of reduced and increased copy number in breast cancer. These regions 
of gene dosage abnormality were initially recognized in CGH studies. Our approach has 
been to define these regions by application of FISH with probes distributed at few 
megabase intervals over the regions of interest. Probes for these studies were acquired as 
anonymous cosmid clones from the National Laboratory Gene Library Project and 
mapped with 2-3 Mb precision using our custom built semi-automatic digital imaging 
microscope. Generation of high density probes for mapping studies is straightforward 
with this system since probes can be mapped at the rate of a few per day. 

The region 20ql3-20qter appeared to be consistently increased in copy number 
in breast cancer cell lines and in primary breast cancers. We have applied FIS with 
mapped probes to define the region and nature of amplification more precisely. 
Amplification was defined simply by counting the number of hybridization signals in 
interphase nuclei at each locus for which a mapped probe existed. Previous studies of 
Her-2/neu amplification indicated that this gives an accurate estimate of the level of 
amplification. Figure 20a shows the collection of probes mapped and used for this 
purpose. Figure 20b shows the level of amplification along chromosome 20 in the breast 
cancer cell line BT474. The level of amplification varied significantly as a function of 
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position along the chromosome suggesting a very complex amplification process with the 
maximum (> 40-fold amplification) occurring in a 2 Mb region at 20ql3.2. This 
localized the amplified region substantially and suggested a study in metaphase to 
investigate the mechanism by which the amplification occurred. The results of metaphase 
analysis show at least two different amplicon structures. Information about amplicon 
structure is important since it may identify simple amplicons that will facilitate positional 
cloning efforts (e.g., microdissection and cloning of sequences from one insertion site 
may define a particularly simple amplicon). 

FISH with mapped probes also has proved useful in localizing the inactive genes 
that may have tumor suppressor function. Chromosome 16q appears to be consistently 
present in reduced copy number by CGH and showing LOH in cancers of the breast, 
prostate, bladder and ovary. We have applied FISH to characterize 16q in breast cancer 
using physically mapped cosmid probes. In these studies, dual color FISH was performed 
using a probe to the test locus and to the centromere of chromosome 16. Two patterns of 
hybridization were observed, one where the ratio of the number of signals from the test 
probe was about the same as the number of signals from the centromere probes and one 
where the test signal to centromere signal ratio was less than one. Interestingly, the latter 
pattern correlates strongly with LOH while the former signal does not. This is also true 
for chromosome 17 near the p53 locus. Thus, at these loci, LOH appears to be caused 
by physical deletion of one allele indicating the location of an inactive tumor suppressor 
gene. These studies suggested 16q22.1 as the location of that gene. The cell adhesion 
molecule E-cadherin maps to that region. Thus, we selected a probe to E-cadherin and 
used this to explore E-cadherin deletions in 1 1 breast cancer cell lines. A high 
correlation was observed between deletion of one E-cadherin allele and loss of E-cadherin 
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expression. However, the FISH studies indicate that one allele of E-cadherin remained in 
all lines. This suggested that the remaining allele of E-cadherin must be mutated. We 
tested this by analyzing the 16 exons of E-cadherin at the DNA sequence level using 
nonisotopic SSCP analysis and by DNA sequencing. All cell lines showing loss of E- 
cadherin expression showed inactivating mutations. The cell lines expressing E-cadherin 
appeared normal at the DNA sequence level. Thus, loss of cell adhesion leading to 
increased local invasion and/or metastasis is suggested as an important event involved in 
the progression of breast cancer. FISH was useful in localizing this inactivated gene 
since inactivation occurred by physical loss of one allele. Since ovarian cancers also 
show reduced copy number at 16q, it is reasonable to determine whether E-cadherin 
inactivation plays a role in the progression of this cancer as well. 

The descriptions of the foregoing embodiments of the invention have been 
presented for purposes of illustration and description. They are not intended to be 
exhaustive or to limit the invention to the precise form disclosed, and obviously many 
modifications and variations are possible in light of the above teachings. The 
embodiments were chosen and described in order to best explain the principles of the 
invention and its practical application to enable thereby others skilled in the art to best 
utilize the invention in various embodiments and with various modifications as are suited 
to the particular use contemplated. It is intended that the scope of the invention be 
defined by the claims appended hereto. All references cited herein are incorporated by 
reference. 
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